Visual Text Compression as Measure Transport

Lv Tang; Tianyi Zheng; Yang Liu; Bo Li; Xingyu Li

arXiv:2605.06708·cs.CV·May 11, 2026

Visual Text Compression as Measure Transport

Lv Tang, Tianyi Zheng, Yang Liu, Bo Li, Xingyu Li

PDF

TL;DR

This paper introduces a measure transport framework for visual text compression (VTC) to quantify task-relevant information loss, enabling better decision-making and re-encoding strategies for long-context NLP tasks.

Contribution

It formulates VTC as a measure transport problem, providing a label-free criterion for path selection and a re-encoding mechanism based on transport costs.

Findings

01

The label-free routing rule matches the oracle on 70.8% of datasets.

02

Transport-informed re-encoding improves task scores by 3.3%.

03

VTC achieves 3-20x token reduction with controlled information loss.

Abstract

Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$ -- $20 \times$ fewer decoder tokens than subword tokenization. Yet token savings do not translate predictably into downstream utility: on some tasks the visual path matches or exceeds the text path, on others it collapses, and the compression ratio itself does not predict which regime will occur. The missing quantity is therefore not another summary of efficiency, but a principled measure of task-relevant information loss induced by visual encoding. We address this problem by formulating VTC in the language of measure transport. Treating text and visual tokens as empirical probability measures, we show that the ViT patch encoder induces a push-forward map whose transport cost decomposes into a precision cost from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.