Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training
Chengqian Zhang, Wei Zhu, Kyumin Lee

TL;DR
Hybrid-LoRA is a novel post-training framework that combines full fine-tuning and Low-Rank Adaptation to efficiently adapt large language models for complex reasoning tasks, achieving near full fine-tuning performance with reduced costs.
Contribution
It introduces a Hybrid-LoRA Score to selectively apply full fine-tuning and LoRA, significantly improving performance over existing PEFT methods in post-training for reasoning tasks.
Findings
Hybrid-LoRA matches full fine-tuning performance with only 10% of modules fully fine-tuned.
It outperforms four state-of-the-art PEFT post-training baselines by up to 5.65%.
Hybrid-LoRA achieves an average improvement of 4.36% over the best baseline.
Abstract
Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and multi-step reasoning. Reinforcement learning with verifiable rewards (RLVR) has recently emerged as a particularly effective post-training paradigm for improving reasoning capabilities, with critic-free algorithms such as GRPO and GSPO enabling scalable optimization. However, RLVR post-training with full fine-tuning (FFT) requires substantial GPU memory and incurs high training costs. Although parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), effectively reduce computational costs, they often suffer from a noticeable performance gap compared to full fine-tuning in post-training for complex reasoning tasks. In this paper, we propose Hybrid-LoRA, an efficient hybrid post-training framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
