Vision Pair Learning: An Efficient Training Framework for Image Classification
Bei Tong, Xiaoyuan Yu

TL;DR
This paper introduces Vision Pair Learning (VPL), a training framework combining transformer and CNN branches that improves image classification accuracy efficiently by leveraging their complementary strengths during multi-stage training.
Contribution
VPL is a novel training framework that enables simultaneous learning of transformer and CNN branches, enhancing performance and reducing training time without external data.
Findings
VPL improves ViT-Base accuracy to 83.47% on ImageNet-1k.
VPL enhances ResNet-50 accuracy to 79.61% on ImageNet-1k.
Experiments confirm the effectiveness of paired training of transformer and CNN.
Abstract
Transformer is a potentially powerful architecture for vision tasks. Although equipped with more parameters and attention mechanism, its performance is not as dominant as CNN currently. CNN is usually computationally cheaper and still the leading competitor in various vision tasks. One research direction is to adopt the successful ideas of CNN and improve transformer, but it often relies on elaborated and heuristic network design. Observing that transformer and CNN are complementary in representation learning and convergence speed, we propose an efficient training framework called Vision Pair Learning (VPL) for image classification task. VPL builds up a network composed of a transformer branch, a CNN branch and pair learning module. With multi-stage training strategy, VPL enables the branches to learn from their partners during the appropriate stage of the training process, and makes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
