SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning

Qifan Yu; Xinyu Ma; Zhijian Zhuo; Minrui Wang; Deyi Liu; Shiyi Zhan; Yiyuan Ma; Liang Xiang; Xingyan Bin; Di He

arXiv:2602.02472·cs.LG·February 3, 2026

SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning

Qifan Yu, Xinyu Ma, Zhijian Zhuo, Minrui Wang, Deyi Liu, Shiyi Zhan, Yiyuan Ma, Liang Xiang, Xingyan Bin, Di He

PDF

Open Access

TL;DR

SPARKLING introduces a novel framework for mid-stage width expansion in progressive learning, stabilizing training and improving efficiency by balancing signal preservation and symmetry breaking, leading to significant cost reductions.

Contribution

The paper presents SPARKLING, a new method for stable and efficient width expansion during progressive learning, addressing challenges of activation instability and symmetry in mid-stage expansion.

Findings

01

Outperforms training from scratch across multiple models.

02

Reduces training cost by up to 35% with 2x width expansion.

03

Effective across various optimizer families.

Abstract

Progressive Learning (PL) reduces pre-training computational overhead by gradually increasing model scale. While prior work has extensively explored depth expansion, width expansion remains significantly understudied, with the few existing methods limited to the early stages of training. However, expanding width during the mid-stage is essential for maximizing computational savings, yet it remains a formidable challenge due to severe training instabilities. Empirically, we show that naive initialization at this stage disrupts activation statistics, triggering loss spikes, while copy-based initialization introduces gradient symmetry that hinders feature diversity. To address these issues, we propose SPARKLING (balancing {S}ignal {P}reservation {A}nd symmet{R}y brea{K}ing for width-progressive {L}earn{ING}), a novel framework for mid-stage width expansion. Our method achieves signal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications