TL;DR
This paper introduces SIFT, a spectral optimization framework that enables controllable model steering by orthogonalizing subspaces, effectively addressing interference between objectives and constraints in model training.
Contribution
The paper presents a novel spectral interference-free training method, SIFT, that improves constrained model adaptation across diverse applications by mitigating objective-constraint conflicts.
Findings
SIFT outperforms control-based and control-free baselines in multiple tasks.
Spectral orthogonalization resolves cross-task interference during training.
SIFT achieves robust performance improvements in safety, unlearning, speech, and hallucination mitigation.
Abstract
Foundation models, such as large language models (LLMs), are powerful but often require customization before deployment to satisfy practical constraints such as safety, privacy, and task-specific requirements, leading to "constrained" optimization problems for model steering and adaptation. However, solving such problems remains largely underexplored and is particularly challenging due to interference between the primary objective and constraint objectives during optimization. In this paper, we propose a subspace control framework for constrained model training. Specifically, (i) we first analyze, from a model merging perspective, how spectral cross-task interference arises and show that it can be resolved via a one-shot solution that orthogonalizes the merged subspace; (ii) we establish a connection between this solution and gradient orthogonalization in the spectral optimizer Muon;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
