FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

Jiayi Tian; Ryan Solgi; Jinming Lu; Yifan Yang; Hai Li; Zheng Zhang

arXiv:2505.23966·cs.CL·February 9, 2026

FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

Jiayi Tian, Ryan Solgi, Jinming Lu, Yifan Yang, Hai Li, Zheng Zhang

PDF

1 Repo 1 Video

TL;DR

FLAT-LLM introduces a training-free, fine-grained low-rank transformation technique for compressing large language models, significantly reducing computational demands while maintaining high accuracy and enabling faster inference.

Contribution

It proposes a novel, training-free structural compression method using eigenvector-based low-rank transformations and adaptive rank redistribution for LLMs.

Findings

01

Outperforms structural pruning in generalization and downstream tasks

02

Achieves inference speedups over existing decomposition methods

03

Completes calibration within a few minutes

Abstract

Large Language Models (LLMs) have enabled remarkable progress in natural language processing, yet their high computational and memory demands pose challenges for deployment in resource-constrained environments. Although recent low-rank decomposition methods offer a promising path for structural compression, they often suffer from accuracy degradation, expensive calibration procedures, and result in inefficient model architectures that hinder real-world inference speedups. In this paper, we propose FLAT-LLM, a fast and accurate, training-free structural compression method based on fine-grained low-rank transformations in the activation space. Specifically, we reduce the hidden dimension by transforming the weights using truncated eigenvectors computed via head-wise Principal Component Analysis, and employ a greedy budget redistribution strategy to adaptively allocate ranks across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ttttttris/flat-llm
pytorchOfficial

Videos

FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression· underline

Taxonomy

MethodsPruning