Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for   Memory-Efficient Finetuning

Chen Zhao; Shuming Liu; Karttikeya Mangalam; Guocheng Qian; Fatimah; Zohra; Abdulmohsen Alghannam; Jitendra Malik; Bernard Ghanem

arXiv:2401.04105·cs.CV·April 2, 2024·1 cites

Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah, Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem

PDF

Open Access 1 Repo

TL;DR

Dr$^2$Net introduces a reversible dual-residual network architecture that enables memory-efficient finetuning of large pretrained models across various tasks without sacrificing performance.

Contribution

The paper presents a novel reversible dual-residual network design that reduces memory consumption during finetuning of large models, with a dynamic training strategy for improved precision.

Findings

01

Achieves comparable performance to standard finetuning

02

Significantly reduces memory usage during training

03

Applicable across multiple pretrained models and tasks

Abstract

Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks by end-to-end finetuning, which is highly memory-intensive for tasks with high-resolution data, e.g., video understanding, small object detection, and point cloud analysis. In this paper, we propose Dynamic Reversible Dual-Residual Networks, or Dr $^{2}$ Net, a novel family of network architectures that acts as a surrogate network to finetune a pretrained model with substantially reduced memory consumption. Dr $^{2}$ Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible. Due to its reversibility, intermediate activations, which can be reconstructed from output, are cleared from memory during training. We use two coefficients on either type of residual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coolbay/Dr2Net
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Adversarial Robustness in Machine Learning