Low-Rank Compression for IMC Arrays
Kang Eun Jeon, Johnny Rhe, Jong Hwan Ko

TL;DR
This paper introduces a low-rank compression method for in-memory computing architectures that improves efficiency and accuracy, addressing limitations of traditional pruning approaches.
Contribution
We propose a novel low-rank compression technique with SDK mapping and group convolution to enhance IMC array utilization and accuracy.
Findings
Achieves up to 2.5x speedup over pruning methods.
Provides up to +20.9% accuracy improvement.
Reduces area and energy overheads in IMC architectures.
Abstract
In this study, we address the challenge of low-rank model compression in the context of in-memory computing (IMC) architectures. Traditional pruning approaches, while effective in model size reduction, necessitate additional peripheral circuitry to manage complex dataflows and mitigate dislocation issues, leading to increased area and energy overheads. To circumvent these drawbacks, we propose leveraging low-rank compression techniques, which, unlike pruning, streamline the dataflow and seamlessly integrate with IMC architectures. However, low-rank compression presents its own set of challenges, namely i) suboptimal IMC array utilization and ii) compromised accuracy. To address these issues, we introduce a novel approach i) employing shift and duplicate kernel (SDK) mapping technique, which exploits idle IMC columns for parallel processing, and ii) group low-rank convolution, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Wireless Communication Techniques · Advanced MIMO Systems Optimization
MethodsPruning · Sparse Evolutionary Training
