SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning
Qifan Yu, Xinyu Ma, Zhijian Zhuo, Minrui Wang, Deyi Liu, Shiyi Zhan, Yiyuan Ma, Liang Xiang, Xingyan Bin, Di He

TL;DR
SPARKLING introduces a novel framework for mid-stage width expansion in progressive learning, stabilizing training and improving efficiency by balancing signal preservation and symmetry breaking, leading to significant cost reductions.
Contribution
The paper presents SPARKLING, a new method for stable and efficient width expansion during progressive learning, addressing challenges of activation instability and symmetry in mid-stage expansion.
Findings
Outperforms training from scratch across multiple models.
Reduces training cost by up to 35% with 2x width expansion.
Effective across various optimizer families.
Abstract
Progressive Learning (PL) reduces pre-training computational overhead by gradually increasing model scale. While prior work has extensively explored depth expansion, width expansion remains significantly understudied, with the few existing methods limited to the early stages of training. However, expanding width during the mid-stage is essential for maximizing computational savings, yet it remains a formidable challenge due to severe training instabilities. Empirically, we show that naive initialization at this stage disrupts activation statistics, triggering loss spikes, while copy-based initialization introduces gradient symmetry that hinders feature diversity. To address these issues, we propose SPARKLING (balancing {S}ignal {P}reservation {A}nd symmet{R}y brea{K}ing for width-progressive {L}earn{ING}), a novel framework for mid-stage width expansion. Our method achieves signal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
