Adjoined Networks: A Training Paradigm with Applications to Network   Compression

Utkarsh Nath; Shrinu Kushagra; Yingzhen Yang

arXiv:2006.05624·cs.LG·April 18, 2022·1 cites

Adjoined Networks: A Training Paradigm with Applications to Network Compression

Utkarsh Nath, Shrinu Kushagra, Yingzhen Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Adjoined Networks, a training paradigm that jointly trains and compresses neural networks, achieving high accuracy with significantly fewer parameters and FLOPs, and extends it with neural architecture search for further efficiency.

Contribution

The paper proposes Adjoined Networks, a novel training paradigm for simultaneous network compression and regularization, and introduces Differentiable Adjoined Networks that optimize architecture and weights jointly.

Findings

01

AN achieves 71.8% top-1 accuracy with 1.8M parameters on ImageNet.

02

DAN reduces parameters and FLOPs by over 3.8x and 2.2x respectively, while maintaining ResNet-50 accuracy.

03

The approach is effective across various large-scale datasets and CNN architectures.

Abstract

Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

utkarshnath/Adjoint-Network
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsLinear Layer