Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models

Zhongpan Tang

arXiv:2512.16963·cs.LG·December 23, 2025

Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models

Zhongpan Tang

PDF

Open Access

TL;DR

This paper introduces a novel approach called 'Compression is Routing' using a Transformer Autoencoder that leverages reconstruction error as an intrinsic signal for modular language models, enabling scalable expert scheduling and handling ultra-long contexts.

Contribution

It proposes a new architecture that uses reconstruction error for expert routing, eliminating the need for explicit gating and improving scalability in modular language models.

Findings

01

Achieved 64x sequence length compression with high in-domain accuracy

02

Reconstruction error effectively discriminates between in-domain and out-of-distribution data

03

Demonstrated potential for scalable expert scheduling without explicit gating mechanisms

Abstract

Current Large Language Models (LLMs) face three major challenges: context length limitations, high inference costs, and catastrophic forgetting during continual learning. While Mixture-of-Experts (MoE) architectures mitigate some of these conflicts, their routing mechanisms typically rely on explicitly trained auxiliary classifiers. This not only increases system complexity but also often lacks interpretability when handling mixed-domain inputs. Building upon the premise that ``Compression is Intelligence,'' this paper proposes a novel architectural philosophy: Compression is Routing. We trained an 87M-parameter end-to-end Transformer Autoencoder, achieving a 64x sequence length compression (compressing 512 tokens into 8 latent vectors). Experimental results demonstrate that this compressor possesses extreme domain discriminative capability: it achieves a reconstruction accuracy of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Speech Recognition and Synthesis