Optical Context Compression Is Just (Bad) Autoencoding

Ivan Yee Lee; Cheng Yang; Taylor Berg-Kirkpatrick

arXiv:2512.03643·cs.CV·April 7, 2026

Optical Context Compression Is Just (Bad) Autoencoding

Ivan Yee Lee, Cheng Yang, Taylor Berg-Kirkpatrick

PDF

1 Repo 1 Models

TL;DR

This paper critically evaluates optical context compression, showing that simple direct methods outperform complex vision-based approaches in text reconstruction and language modeling tasks.

Contribution

It demonstrates that complex vision-based compression does not outperform simple baselines, challenging the hype around optical context compression.

Findings

01

Vision encoder does not outperform simple mean pooling or hierarchical encoding.

02

Direct methods match or surpass vision in reconstruction at all compression ratios.

03

Vision performs similarly to truncation in language modeling and does not surpass the best baseline.

Abstract

DeepSeek-OCR shows that rendered text can be reconstructed from a small number of vision tokens, sparking excitement about using vision as a compression medium for long textual contexts. But this pipeline requires rendering token embeddings to pixels and compressing from there -- discarding learned representations in favor of an image the vision encoder must then recover from. We ask whether this detour helps. Comparing DeepSeek-OCR's vision encoder against near-zero-parameter mean pooling and a learned hierarchical encoder, we find it does not. For reconstruction, simple direct methods match or surpass vision at every compression ratio. For language modeling, vision performs comparably to truncation -- a baseline that simply discards context -- and loses to the hierarchical encoder at every compression ratio. As expected, all compression methods outperform truncation for factual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ivnle/bad-autoencoding
github

Models

🤗
ivnle/bad-autoencoding
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.