TL;DR
This paper introduces a hybrid fine-tuning method for LLMs that combines full and parameter-efficient tuning, supported by a new theoretical framework and empirical validation showing improved performance.
Contribution
It proposes a novel hybrid fine-tuning algorithm for LLMs, along with a convergence analysis framework based on hybrid smoothness, and demonstrates its effectiveness across tasks.
Findings
Consistent performance improvements over existing methods
Theoretical convergence guarantees for the hybrid optimization algorithm
Effective across various downstream tasks and model architectures
Abstract
Fine-tuning Large Language Models (LLMs) typically involves either full fine-tuning, which updates all model parameters, or Parameter-Efficient Fine-Tuning (PEFT), which adjusts a small subset of parameters. However, both approaches have inherent limitations: full fine-tuning is computationally expensive, while PEFT often struggles to learn new knowledge and exhibits suboptimal performance. To overcome these issues, we propose a novel hybrid fine-tuning approach that jointly updates both LLMs and PEFT modules using a combination of zeroth-order and first-order optimization methods. To analyze our new algorithm, we develop a theoretical framework centered on the concept of hybrid smoothness condition, which accounts for the heterogeneous nature of the optimization landscape in joint LLM and PEFT training. We derive a rigorous convergence analysis for the convergence of reshuffling-type…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
