TY - CHAP
T1 - Energy Efficient Memory-based Inference of LSTM by Exploiting FPGA Overlay
AU - Guha, Krishnendu
AU - Trivedi, Amit Ranjan
AU - Bhunia, Swarup
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The fourth industrial revolution (a.k.a. Industry 4.0) relies on intelligent machines that are fully autonomous and can diagnose and resolve operational issues without human intervention. Therefore, embedded computing platforms enabling the necessary computations for intelligent machines are critical for the ongoing industrial revolution. Especially field programmable gate arrays (FPGAs) are highly suited for such embedded computing due to their high performance and easy reconfigurability. Many Industry 4.0 applications, such as predictive maintenance, critically depend on real-time and reliable processing of time-series data using recurrent neural network models, especially long short-term memory (LSTM). Therefore, the FPGA-based acceleration of LSTM is imperative for many Industry 4.0 applications. Existing LSTM models for FPGAs incur significant resources and power and are not energy efficient. Moreover, prior works focusing on reducing latency and power mainly adhere to model pruning, which compromises the accuracy. Comparatively, we propose a memory-based energy-efficient inference of LSTM by exploiting overlay in FPGA. In our methodology, we pre-compute predominant operations and store them in the available embedded memory blocks (EMBs) of an FPGA. On-demand, these pre-computed results are accessed to minimize the necessary workload. Via this methodology, we obtained lower latency, lower power, and better energy efficiency than state-of-the-art LSTM models without any loss of accuracy. Specifically, when implemented on the ZynQ XCU104 evaluation board, a 3 x reduction in latency and 5 x reduction in power is obtained then the reference 16-bit LSTM model.
AB - The fourth industrial revolution (a.k.a. Industry 4.0) relies on intelligent machines that are fully autonomous and can diagnose and resolve operational issues without human intervention. Therefore, embedded computing platforms enabling the necessary computations for intelligent machines are critical for the ongoing industrial revolution. Especially field programmable gate arrays (FPGAs) are highly suited for such embedded computing due to their high performance and easy reconfigurability. Many Industry 4.0 applications, such as predictive maintenance, critically depend on real-time and reliable processing of time-series data using recurrent neural network models, especially long short-term memory (LSTM). Therefore, the FPGA-based acceleration of LSTM is imperative for many Industry 4.0 applications. Existing LSTM models for FPGAs incur significant resources and power and are not energy efficient. Moreover, prior works focusing on reducing latency and power mainly adhere to model pruning, which compromises the accuracy. Comparatively, we propose a memory-based energy-efficient inference of LSTM by exploiting overlay in FPGA. In our methodology, we pre-compute predominant operations and store them in the available embedded memory blocks (EMBs) of an FPGA. On-demand, these pre-computed results are accessed to minimize the necessary workload. Via this methodology, we obtained lower latency, lower power, and better energy efficiency than state-of-the-art LSTM models without any loss of accuracy. Specifically, when implemented on the ZynQ XCU104 evaluation board, a 3 x reduction in latency and 5 x reduction in power is obtained then the reference 16-bit LSTM model.
KW - Computing with Memory
KW - Energy Efficiency
KW - FPGA
KW - LSTM
KW - Memory-based Mapping
KW - ML
UR - https://www.scopus.com/pages/publications/85169563752
U2 - 10.1109/IJCNN54540.2023.10191667
DO - 10.1109/IJCNN54540.2023.10191667
M3 - Chapter
AN - SCOPUS:85169563752
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - IJCNN 2023 - International Joint Conference on Neural Networks, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 International Joint Conference on Neural Networks, IJCNN 2023
Y2 - 18 June 2023 through 23 June 2023
ER -