Rate-Distortion Optimization for Transformer Inference

Anderson de Andrade; Alon Harell; Ivan V. Baji\'c

arXiv:2601.22002·cs.LG·April 21, 2026

Rate-Distortion Optimization for Transformer Inference

Anderson de Andrade, Alon Harell, Ivan V. Baji\'c

PDF

TL;DR

This paper introduces a rate-distortion framework for lossy compression of transformer intermediate representations, enabling more efficient inference by balancing bitrate and accuracy.

Contribution

It presents a novel information-theoretic approach to compress transformer representations, providing bounds and insights into their rate-distortion behavior.

Findings

01

Simple codecs achieve significant rate savings and outperform complex methods.

02

The rate-distortion behavior of transformers can be characterized and bounded.

03

The framework enhances understanding of representation coding in transformers.

Abstract

Transformers achieve superior performance on many tasks, but impose heavy compute and memory requirements during inference. This inference can be made more efficient by partitioning the process across multiple devices, which, in turn, requires compressing its intermediate representations. We introduce a principled rate-distortion-based framework for lossy compression that learns compact encodings that explicitly trade bitrate for accuracy. Experiments on language benchmarks show that the simplest of the proposed codecs achieves substantial rate savings, outperforming more complex methods. We characterize and analyze the rate-distortion behaviour of transformers, offering a unified lens for understanding performance in representation coding. This formulation extends information-theoretic concepts to derive bounds on the achievable rate of learnable codecs. For different architectures and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.