SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping

Yu-Chen Lu; Sheng-Feng Yu; Hui-Hsien Weng; Pei-Shuo Wang; Yu-Fang Hu; Liang Hung-Chun; Hung-Yueh Chiang; Kai-Chiang Wu

arXiv:2512.13494·cs.CL·December 16, 2025

SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping

Yu-Chen Lu, Sheng-Feng Yu, Hui-Hsien Weng, Pei-Shuo Wang, Yu-Fang Hu, Liang Hung-Chun, Hung-Yueh Chiang, Kai-Chiang Wu

PDF

Open Access 1 Video

TL;DR

SkipCat introduces a novel low-rank compression framework for large language models that maintains higher effective ranks and better performance under resource constraints by shared projections and block skipping techniques.

Contribution

The paper presents SkipCat, a new low-rank compression method with shared projections and block skipping, enabling higher ranks and improved accuracy without additional fine-tuning.

Findings

01

Outperforms previous methods by 7% accuracy on zero-shot tasks.

02

Achieves better compression efficiency with shared projections.

03

Retains more effective ranks under the same compression budget.

Abstract

Large language models (LLM) have achieved remarkable performance across a wide range of tasks. However, their substantial parameter sizes pose significant challenges for deployment on edge devices with limited computational and memory resources. Low-rank compression is a promising approach to address this issue, as it reduces both computational and memory costs, making LLM more suitable for resource-constrained environments. Nonetheless, na\"ive low-rank compression methods require a significant reduction in the retained rank to achieve meaningful memory and computation savings. For a low-rank model, the ranks need to be reduced by more than half to yield efficiency gains. Such aggressive truncation, however, typically results in substantial performance degradation. To address this trade-off, we propose SkipCat, a novel low-rank compression framework that enables the use of higher ranks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Multimodal Machine Learning Applications