The Lossy Horizon: Error-Bounded Predictive Coding for Lossy Text Compression (Episode I)
Nnamdi Aghanya, Jun Li, Kewei Wang

TL;DR
This paper introduces Error-Bounded Predictive Coding (EPC), a novel lossy text compression method using masked language models that balances fidelity and compression by storing minimal corrections, outperforming baseline methods.
Contribution
EPC is the first lossy text codec leveraging MLMs with residual corrections for continuous rate-distortion control, demonstrating superior performance over simpler baselines.
Findings
EPC achieves higher fidelity at lower bit rates.
EPC outperforms the Predictive Masking baseline.
EPC efficiently utilizes the model's knowledge for compression.
Abstract
Large Language Models (LLMs) can achieve near-optimal lossless compression by acting as powerful probability models. We investigate their use in the lossy domain, where reconstruction fidelity is traded for higher compression ratios. This paper introduces Error-Bounded Predictive Coding (EPC), a lossy text codec that leverages a Masked Language Model (MLM) as a decompressor. Instead of storing a subset of original tokens, EPC allows the model to predict masked content and stores minimal, rank-based corrections only when the model's top prediction is incorrect. This creates a residual channel that offers continuous rate-distortion control. We compare EPC to a simpler Predictive Masking (PM) baseline and a transform-based Vector Quantisation with a Residual Patch (VQ+RE) approach. Through an evaluation that includes precise bit accounting and rate-distortion analysis, we demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
