TL;DR
LIFT is a novel low-rank informed sparse fine-tuning method that updates only the top 5% principal weights, achieving superior reasoning performance and knowledge retention compared to full fine-tuning.
Contribution
The paper introduces LIFT, a rank reduction-based sparse fine-tuning approach that identifies critical weights post-rank approximation, improving efficiency and reasoning ability.
Findings
LIFT outperforms full fine-tuning on reasoning tasks.
LIFT retains more source-domain knowledge than alternatives.
Updating only 5% of weights yields strong reasoning performance.
Abstract
Recent studies have shown that supervised fine-tuning of LLMs on a small number of high-quality datasets can yield strong reasoning capabilities. However, full fine-tuning (Full FT), while powerful, is computationally expensive and susceptible to overfitting and catastrophic forgetting, particularly when data is limited. Sparse fine-tuning, which previously achieved notable success by updating only a small subset of model parameters, offers a promising trade-off between efficiency and effectiveness. Yet, it has lagged behind in the LLM era due to the difficulty of identifying parameters truly critical for reasoning. In this work, we state that weights with the largest magnitude after low-rank approximation are critical weights for fine-tuning, which we call Principal Weights. Surprisingly, while magnitude-based sparse fine-tuning performs poorly as a baseline on LLM fine-tuning, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
