PocketLLM: Ultimate Compression of Large Language Models via Meta Networks

Ye Tian; Chengcheng Wang; Jing Han; Yehui Tang; Kai Han

arXiv:2511.17637·cs.LG·November 25, 2025

PocketLLM: Ultimate Compression of Large Language Models via Meta Networks

Ye Tian, Chengcheng Wang, Jing Han, Yehui Tang, Kai Han

PDF

Open Access 1 Video

TL;DR

PocketLLM introduces a novel meta-network-based compression technique that significantly reduces the size of large language models by encoding weights into discrete latent vectors, enabling efficient storage and transmission with minimal accuracy loss.

Contribution

The paper presents a new compression method for LLMs using latent space encoding with meta-networks, outperforming traditional quantization and pruning methods at high compression ratios.

Findings

01

Compresses Llama 2-7B by 10x with negligible accuracy drop

02

Achieves superior performance at high compression ratios

03

Uses a simple encoder-decoder architecture with a codebook

Abstract

As Large Language Models (LLMs) continue to grow in size, storing and transmitting them on edge devices becomes increasingly challenging. Traditional methods like quantization and pruning struggle to achieve extreme compression of LLMs without sacrificing accuracy. In this paper, we introduce PocketLLM, a novel approach to compress LLMs in a latent space via meta-networks. A simple encoder network is proposed to project the weights of LLMs into discrete latent vectors, which are then represented using a compact codebook. A lightweight decoder network is employed to map the codebook's representative vectors back to the original weight space. This method allows for significant compression of the large weights in LLMs, consisting solely of a small decoder, a concise codebook, and an index. Extensive experiments show that PocketLLM achieves superior performance even at significantly high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PocketLLM: Ultimate Compression of Large Language Models via Meta Networks· underline

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Natural Language Processing Techniques