Mixtraining: A Better Trade-Off Between Compute and Performance
Zexin Li, Jiancheng Zhang, Yufei Li, Yinglun Zhu, Cong Liu

TL;DR
MixTraining is a novel framework that interleaves self-supervised and supervised learning epochs to improve model accuracy while reducing computational costs, offering a better trade-off between compute and performance.
Contribution
It introduces a unified training approach that combines SSL and SL with smooth transitions, enhancing synergy and efficiency in model training.
Findings
Achieves 8.81% absolute accuracy gain on TinyImageNet.
Accelerates training by up to 1.29x with ViT-Tiny.
Demonstrates superior compute-performance trade-off compared to traditional methods.
Abstract
Incorporating self-supervised learning (SSL) before standard supervised learning (SL) has become a widely used strategy to enhance model performance, particularly in data-limited scenarios. However, this approach introduces a trade-off between computation and performance: while SSL helps with representation learning, it requires a separate, often time-consuming training phase, increasing computational overhead and limiting efficiency in resource-constrained settings. To address these challenges, we propose MixTraining, a novel framework that interleaves several SSL and SL epochs within a unified mixtraining training phase, featuring a smooth transition between two learning objectives. MixTraining enhances synergy between SSL and SL for improved accuracy and consolidates shared computation steps to reduce computation overhead. MixTraining is versatile and applicable to both single-task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
