Olica: Efficient Structured Pruning of Large Language Models without Retraining

Jiujun He; Huazhen Lin

arXiv:2506.08436·cs.CL·June 11, 2025

Olica: Efficient Structured Pruning of Large Language Models without Retraining

Jiujun He, Huazhen Lin

PDF

Open Access 1 Repo

TL;DR

Olica is a novel pruning framework for large language models that removes the need for retraining by using PCA and linear calibration, significantly reducing computational costs while maintaining model accuracy.

Contribution

It introduces a retraining-free structured pruning method for LLMs using PCA and SVD, improving efficiency and preserving model performance.

Findings

01

Reduces pruning complexity by a factor of the square of attention heads

02

Maintains accuracy without retraining across multiple benchmarks

03

Uses linear calibration to mitigate error accumulation in pruned models

Abstract

Most existing structured pruning methods for Large Language Models (LLMs) require substantial computational and data resources for retraining to reestablish the corrupted correlations, making them prohibitively expensive. To address this, we propose a pruning framework for LLMs called Orthogonal decomposition and Linear Calibration (Olica), which eliminates the need for retraining. A key observation is that the multi-head attention (MHA) layer depends on two types of matrix products. By treating these matrix products as unified entities and applying principal component analysis (PCA), we extract the most important information to compress LLMs without sacrificing accuracy or disrupting their original structure. Consequently, retraining becomes unnecessary. A fast decomposition method is devised, reducing the complexity of PCA by a factor of the square of the number of attention heads.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bettertmrr/llm-olica
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Big Data and Digital Economy

MethodsSoftmax · Linear Layer · Attention Is All You Need · Principal Components Analysis · Multi-Head Attention · Pruning