ReDistill: Residual Encoded Distillation for Peak Memory Reduction of CNNs
Fang Chen, Gourav Datta, Mujahid Al Rafi, Hyeran Jeon, Meng Tang

TL;DR
ReDistill introduces a residual encoded distillation technique that significantly reduces peak memory in CNNs and diffusion models while maintaining performance, enabling deployment on resource-constrained devices.
Contribution
The paper proposes a novel residual encoded distillation method within a teacher-student framework for peak memory reduction in CNNs and diffusion models.
Findings
Achieves 4x-5x peak memory reduction in CNNs with minimal accuracy loss.
Reduces peak memory by 4x in diffusion-based image generation models.
Outperforms existing distillation methods in memory efficiency and performance.
Abstract
The expansion of neural network sizes and the enhanced resolution of modern image sensors result in heightened memory and power demands to process modern computer vision models. In order to deploy these models in extremely resource-constrained edge devices, it is crucial to reduce their peak memory, which is the maximum memory consumed during the execution of a model. A naive approach to reducing peak memory is aggressive down-sampling of feature maps via pooling with large stride, which often results in unacceptable degradation in network performance. To mitigate this problem, we propose residual encoded distillation (ReDistill) for peak memory reduction in a teacher-student framework, in which a student network with less memory is derived from the teacher network using aggressive pooling. We apply our distillation method to multiple problems in computer vision, including image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing
MethodsAttention Is All You Need · Softmax · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Layer Normalization · Diffusion · Vision Transformer
