Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation
Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray, T. Chen, David Z. Pan

TL;DR
This paper introduces a novel multi-level in situ generation framework that significantly reduces memory access costs in neural networks, enabling more efficient deployment on resource-limited devices without sacrificing accuracy.
Contribution
It presents the first unified approach leveraging bit-level redundancy and intrinsic correlations in DNN kernels to enable on-the-fly high-resolution parameter recovery with minimal hardware overhead.
Findings
Boosts memory efficiency by 10-20x
Achieves comparable accuracy to state-of-the-art methods
Demonstrates effectiveness on multiple neural network architectures
Abstract
Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. Though extensive efficient accelerator designs, from traditional electronics to emerging photonics, have been successfully demonstrated, they are still bottlenecked by expensive memory accesses due to tremendous gaps between the bandwidth/power/latency of electrical memory and computing cores. Previous solutions fail to fully-leverage the ultra-fast computational speed of emerging DNN accelerators to break through the critical memory bound. In this work, we propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations, directly translating to performance improvement. We are the first to jointly explore the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Reservoir Computing · Advanced Memory and Neural Computing
