Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models
Zhongpan Tang

TL;DR
This paper introduces a novel approach called 'Compression is Routing' using a Transformer Autoencoder that leverages reconstruction error as an intrinsic signal for modular language models, enabling scalable expert scheduling and handling ultra-long contexts.
Contribution
It proposes a new architecture that uses reconstruction error for expert routing, eliminating the need for explicit gating and improving scalability in modular language models.
Findings
Achieved 64x sequence length compression with high in-domain accuracy
Reconstruction error effectively discriminates between in-domain and out-of-distribution data
Demonstrated potential for scalable expert scheduling without explicit gating mechanisms
Abstract
Current Large Language Models (LLMs) face three major challenges: context length limitations, high inference costs, and catastrophic forgetting during continual learning. While Mixture-of-Experts (MoE) architectures mitigate some of these conflicts, their routing mechanisms typically rely on explicitly trained auxiliary classifiers. This not only increases system complexity but also often lacks interpretability when handling mixed-domain inputs. Building upon the premise that ``Compression is Intelligence,'' this paper proposes a novel architectural philosophy: Compression is Routing. We trained an 87M-parameter end-to-end Transformer Autoencoder, achieving a 64x sequence length compression (compressing 512 tokens into 8 latent vectors). Experimental results demonstrate that this compressor possesses extreme domain discriminative capability: it achieves a reconstruction accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Speech Recognition and Synthesis
