BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training

Wenjie Zhou; Bohan Wang; Wei Chen; Xueqi Cheng

arXiv:2510.25244·cs.LG·December 29, 2025

BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training

Wenjie Zhou, Bohan Wang, Wei Chen, Xueqi Cheng

PDF

1 Video

TL;DR

This paper introduces BSFA, a framework that accelerates neural network training by differentially scaling update components in the top eigendirections and orthogonal subspaces, improving speed and stability.

Contribution

The paper proposes BSFA, a novel plug-and-play method that uses PCA-based subspace estimation and block-wise strategies to enhance training efficiency of large models.

Findings

01

Achieves approximately 2× speedup in pre-training LLaMA models.

02

Effectively balances stability and convergence speed through subspace scaling.

03

Demonstrates broad applicability across different tasks and models.

Abstract

Recent studies \citep{gur2018gradient,song2024does, wen2024understanding} highlight a fundamental dichotomy in deep learning optimization: Although parameter updates along the top eigendirections of the loss Hessian (Dom-space) capture most of the update magnitude, they often contribute minimally to loss reduction. In contrast, updates in the orthogonal component (Bulk-space) have smaller magnitudes but drive most learning progress. In this work, we further advance the understanding of this phenomenon and introduce the \textbf{Bulk-Space-Filtration-Accelerator (BSFA)}, a novel plug-and-play framework. BSFA accelerates training by differentially scaling update components projected onto these distinct subspaces, simultaneously enhancing stability by moderating updates in the dominant subspace and boosting convergence speed by amplifying those in the bulk-space. To ensure BSFA is both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training· underline