Adjoined Networks: A Training Paradigm with Applications to Network Compression
Utkarsh Nath, Shrinu Kushagra, Yingzhen Yang

TL;DR
This paper introduces Adjoined Networks, a training paradigm that jointly trains and compresses neural networks, achieving high accuracy with significantly fewer parameters and FLOPs, and extends it with neural architecture search for further efficiency.
Contribution
The paper proposes Adjoined Networks, a novel training paradigm for simultaneous network compression and regularization, and introduces Differentiable Adjoined Networks that optimize architecture and weights jointly.
Findings
AN achieves 71.8% top-1 accuracy with 1.8M parameters on ImageNet.
DAN reduces parameters and FLOPs by over 3.8x and 2.2x respectively, while maintaining ResNet-50 accuracy.
The approach is effective across various large-scale datasets and CNN architectures.
Abstract
Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
MethodsLinear Layer
