FineGates: LLMs Finetuning with Compression using Stochastic Gates

Jonathan Svirsky; Yehonathan Refael; Ofir Lindenbaum

arXiv:2412.12951·cs.LG·December 18, 2024

FineGates: LLMs Finetuning with Compression using Stochastic Gates

Jonathan Svirsky, Yehonathan Refael, Ofir Lindenbaum

PDF

Open Access

TL;DR

This paper introduces FineGates, a finetuning method for large language models using stochastic gates that sparsify the model, reducing parameters and speeding up inference while maintaining competitive accuracy.

Contribution

The paper proposes a novel stochastic gates-based adaptor for LLM finetuning that sparsifies the frozen model and reduces resource usage with minimal accuracy loss.

Findings

01

Improves finetuned model accuracy over several baselines.

02

Enables removal of 20-40% of model parameters without significant accuracy loss.

03

Speeds up inference with a small number of trainable parameters.

Abstract

Large Language Models (LLMs), with billions of parameters, present significant challenges for full finetuning due to the high computational demands, memory requirements, and impracticality of many real-world applications. When faced with limited computational resources or small datasets, updating all model parameters can often result in overfitting. To address this, lightweight finetuning techniques have been proposed, like learning low-rank adapter layers. These methods aim to train only a few additional parameters combined with the base model, which remains frozen, reducing resource usage and mitigating overfitting risks. In this work, we propose an adaptor model based on stochastic gates that simultaneously sparsify the frozen base model with task-specific adaptation. Our method comes with a small number of trainable parameters and allows us to speed up the base model inference with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Parallel Computing and Optimization Techniques · Numerical Methods and Algorithms

MethodsBalanced Selection · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Adapter