ReDistill: Residual Encoded Distillation for Peak Memory Reduction of   CNNs

Fang Chen; Gourav Datta; Mujahid Al Rafi; Hyeran Jeon; Meng Tang

arXiv:2406.03744·cs.CV·April 29, 2025

ReDistill: Residual Encoded Distillation for Peak Memory Reduction of CNNs

Fang Chen, Gourav Datta, Mujahid Al Rafi, Hyeran Jeon, Meng Tang

PDF

Open Access

TL;DR

ReDistill introduces a residual encoded distillation technique that significantly reduces peak memory in CNNs and diffusion models while maintaining performance, enabling deployment on resource-constrained devices.

Contribution

The paper proposes a novel residual encoded distillation method within a teacher-student framework for peak memory reduction in CNNs and diffusion models.

Findings

01

Achieves 4x-5x peak memory reduction in CNNs with minimal accuracy loss.

02

Reduces peak memory by 4x in diffusion-based image generation models.

03

Outperforms existing distillation methods in memory efficiency and performance.

Abstract

The expansion of neural network sizes and the enhanced resolution of modern image sensors result in heightened memory and power demands to process modern computer vision models. In order to deploy these models in extremely resource-constrained edge devices, it is crucial to reduce their peak memory, which is the maximum memory consumed during the execution of a model. A naive approach to reducing peak memory is aggressive down-sampling of feature maps via pooling with large stride, which often results in unacceptable degradation in network performance. To mitigate this problem, we propose residual encoded distillation (ReDistill) for peak memory reduction in a teacher-student framework, in which a student network with less memory is derived from the teacher network using aggressive pooling. We apply our distillation method to multiple problems in computer vision, including image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing

MethodsAttention Is All You Need · Softmax · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Layer Normalization · Diffusion · Vision Transformer