FineGates: LLMs Finetuning with Compression using Stochastic Gates
Jonathan Svirsky, Yehonathan Refael, Ofir Lindenbaum

TL;DR
This paper introduces FineGates, a finetuning method for large language models using stochastic gates that sparsify the model, reducing parameters and speeding up inference while maintaining competitive accuracy.
Contribution
The paper proposes a novel stochastic gates-based adaptor for LLM finetuning that sparsifies the frozen model and reduces resource usage with minimal accuracy loss.
Findings
Improves finetuned model accuracy over several baselines.
Enables removal of 20-40% of model parameters without significant accuracy loss.
Speeds up inference with a small number of trainable parameters.
Abstract
Large Language Models (LLMs), with billions of parameters, present significant challenges for full finetuning due to the high computational demands, memory requirements, and impracticality of many real-world applications. When faced with limited computational resources or small datasets, updating all model parameters can often result in overfitting. To address this, lightweight finetuning techniques have been proposed, like learning low-rank adapter layers. These methods aim to train only a few additional parameters combined with the base model, which remains frozen, reducing resource usage and mitigating overfitting risks. In this work, we propose an adaptor model based on stochastic gates that simultaneously sparsify the frozen base model with task-specific adaptation. Our method comes with a small number of trainable parameters and allows us to speed up the base model inference with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Parallel Computing and Optimization Techniques · Numerical Methods and Algorithms
MethodsBalanced Selection · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Adapter
