Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah, Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem

TL;DR
Dr$^2$Net introduces a reversible dual-residual network architecture that enables memory-efficient finetuning of large pretrained models across various tasks without sacrificing performance.
Contribution
The paper presents a novel reversible dual-residual network design that reduces memory consumption during finetuning of large models, with a dynamic training strategy for improved precision.
Findings
Achieves comparable performance to standard finetuning
Significantly reduces memory usage during training
Applicable across multiple pretrained models and tasks
Abstract
Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks by end-to-end finetuning, which is highly memory-intensive for tasks with high-resolution data, e.g., video understanding, small object detection, and point cloud analysis. In this paper, we propose Dynamic Reversible Dual-Residual Networks, or DrNet, a novel family of network architectures that acts as a surrogate network to finetune a pretrained model with substantially reduced memory consumption. DrNet contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible. Due to its reversibility, intermediate activations, which can be reconstructed from output, are cleared from memory during training. We use two coefficients on either type of residual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Adversarial Robustness in Machine Learning
