Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition
Yu Gong, Miao Yin, Lingyi Huang, Chunhua Deng, Yang Sui, Bo Yuan

TL;DR
This paper introduces a novel energy-efficient LSTM model using hierarchical Tucker tensor decomposition and a specialized hardware architecture, significantly reducing model size and improving performance for video recognition tasks.
Contribution
It proposes the FDHT-LSTM model with ultra-low complexity and a tailored hardware design, achieving high accuracy and efficiency improvements over existing models.
Findings
Order-of-magnitude reduction in model size
Significant accuracy improvements in video recognition
Better throughput and energy efficiency than state-of-the-art hardware
Abstract
Long short-term memory (LSTM) is a type of powerful deep neural network that has been widely used in many sequence analysis and modeling applications. However, the large model size problem of LSTM networks make their practical deployment still very challenging, especially for the video recognition tasks that require high-dimensional input data. Aiming to overcome this limitation and fully unlock the potentials of LSTM models, in this paper we propose to perform algorithm and hardware co-design towards high-performance energy-efficient LSTM networks. At algorithm level, we propose to develop fully decomposed hierarchical Tucker (FDHT) structure-based LSTM, namely FDHT-LSTM, which enjoys ultra-low model complexity while still achieving high accuracy. In order to fully reap such attractive algorithmic benefit, we further develop the corresponding customized hardware architecture to support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications
MethodsTanh Activation · Sigmoid Activation · TuckER · Long Short-Term Memory
