Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures
Abdulmalek Saket

TL;DR
Aletheia introduces a gradient-guided method to selectively apply LoRA adapters to the most relevant layers, significantly improving training efficiency across diverse large language models while maintaining performance.
Contribution
It proposes a novel gradient-based layer selection technique for LoRA fine-tuning, reducing training time without compromising downstream task accuracy.
Findings
Achieves 15-28% training speedup across 14 models
Broadly preserves downstream performance within bounded degradation
Demonstrates consistent speed improvements across multiple architectures
Abstract
Low-Rank Adaptation (LoRA) has become the dominant parameter-efficient fine-tuning method for large language models, yet standard practice applies LoRA adapters uniformly to all transformer layers regardless of their relevance to the downstream task. We introduce Aletheia, a gradient-guided layer selection method that identifies the most task-relevant layers via a lightweight gradient probe and applies LoRA adapters only to those layers with asymmetric rank allocation. Across 81 experiment rows covering 14 successful models from 8 architecture families (0.5B-72B parameters, including dense and Mixture-of-Experts architectures), with one additional documented failed Pythia/GPT-NeoX attempt in Campaign 2, Aletheia achieves a 15-28% training speedup (mean 23.1%, p < 0.001) with bounded extra forgetting and broadly matched downstream behavior on the evaluated MMLU, GSM8K, and HumanEval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
