Visual Text Compression as Measure Transport
Lv Tang, Tianyi Zheng, Yang Liu, Bo Li, Xingyu Li

TL;DR
This paper introduces a measure transport framework for visual text compression (VTC) to quantify task-relevant information loss, enabling better decision-making and re-encoding strategies for long-context NLP tasks.
Contribution
It formulates VTC as a measure transport problem, providing a label-free criterion for path selection and a re-encoding mechanism based on transport costs.
Findings
The label-free routing rule matches the oracle on 70.8% of datasets.
Transport-informed re-encoding improves task scores by 3.3%.
VTC achieves 3-20x token reduction with controlled information loss.
Abstract
Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing -- fewer decoder tokens than subword tokenization. Yet token savings do not translate predictably into downstream utility: on some tasks the visual path matches or exceeds the text path, on others it collapses, and the compression ratio itself does not predict which regime will occur. The missing quantity is therefore not another summary of efficiency, but a principled measure of task-relevant information loss induced by visual encoding. We address this problem by formulating VTC in the language of measure transport. Treating text and visual tokens as empirical probability measures, we show that the ViT patch encoder induces a push-forward map whose transport cost decomposes into a precision cost from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
