Deep Companion Learning: Enhancing Generalization Through Historical   Consistency

Ruizhao Zhu; Venkatesh Saligrama

arXiv:2407.18821·cs.CV·July 29, 2024

Deep Companion Learning: Enhancing Generalization Through Historical Consistency

Ruizhao Zhu, Venkatesh Saligrama

PDF

Open Access

TL;DR

Deep Companion Learning (DCL) introduces a training approach that improves neural network generalization by leveraging a historical model to provide targeted supervision, leading to state-of-the-art results across multiple datasets and architectures.

Contribution

DCL is a novel training method that uses a deep-companion model to enhance generalization by penalizing inconsistent predictions based on historical model performance.

Findings

01

Achieves state-of-the-art accuracy on CIFAR-100, Tiny-ImageNet, and ImageNet-1K.

02

Effective across diverse architectures including ResNet, ShuffleNetV2, and Vision Transformer.

03

Theoretical analysis supports the robustness of the approach.

Abstract

We propose Deep Companion Learning (DCL), a novel training method for Deep Neural Networks (DNNs) that enhances generalization by penalizing inconsistent model predictions compared to its historical performance. To achieve this, we train a deep-companion model (DCM), by using previous versions of the model to provide forecasts on new inputs. This companion model deciphers a meaningful latent semantic structure within the data, thereby providing targeted supervision that encourages the primary model to address the scenarios it finds most challenging. We validate our approach through both theoretical analysis and extensive experimentation, including ablation studies, on a variety of benchmark datasets (CIFAR-100, Tiny-ImageNet, ImageNet-1K) using diverse architectural models (ShuffleNetV2, ResNet, Vision Transformer, etc.), demonstrating state-of-the-art performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducator Training and Historical Pedagogy

MethodsAttention Is All You Need · Adam · Max Pooling · Average Pooling · Label Smoothing · Linear Layer · Byte Pair Encoding · Convolution · Layer Normalization · Softmax