Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity

Jonathan Svirsky; Yehonathan Refael; Ofir Lindenbaum

arXiv:2602.09169·cs.LG·February 11, 2026

Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity

Jonathan Svirsky, Yehonathan Refael, Ofir Lindenbaum

PDF

Open Access

TL;DR

This paper introduces a sparsification-based finetuning method for large language models that reduces parameters and inference time while maintaining accuracy, supported by theoretical guarantees and empirical results.

Contribution

It presents a novel sparsification scheme with stochastic gates for efficient finetuning, outperforming recent methods in efficiency and performance.

Findings

01

Reduces 20-40% of model parameters without accuracy loss

02

Outperforms recent finetuning baselines in efficiency and performance

03

Provides theoretical convergence guarantees for the sparsification process

Abstract

Fully finetuning foundation language models (LMs) with billions of parameters is often impractical due to high computational costs, memory requirements, and the risk of overfitting. Although methods like low-rank adapters help address these challenges by adding small trainable modules to the frozen LM, they also increase memory usage and do not reduce inference latency. We uncover an intriguing phenomenon: sparsifying specific model rows and columns enables efficient task adaptation without requiring weight tuning. We propose a scheme for effective finetuning via sparsification using training stochastic gates, which requires minimal trainable parameters, reduces inference time, and removes 20--40\% of model parameters without significant accuracy loss. Empirical results show it outperforms recent finetuning baselines in efficiency and performance. Additionally, we provide theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques