Real-Time Text Transmission via LLM-Based Entropy Coding over Fixed-Rate Channels
Vishnu Teja Kunde, Jean-Francois Chamberland, Krishna R. Narayanan, Jamison Ebert

TL;DR
This paper explores the tradeoff between compression efficiency and delay in real-time text transmission using various coding schemes with language models, validated across different model scales.
Contribution
It compares multiple coding schemes within a predict-then-code framework for real-time text transmission, analyzing their delay and compression tradeoffs with large language models.
Findings
Huffman coding is practical for over-provisioned channels with zero delay.
Arithmetic coding achieves near-optimal compression but introduces decodability delay.
Scaling models reduces bits per character by about 38%, affecting optimal coding choice.
Abstract
Learning, prediction, and compression are intimately connected: a model that accurately predicts the next symbol in a sequence can be coupled with a source coder to compress that sequence near its information-theoretic limit. When tokenized characters arriving at a fixed reading pace are encoded into variable-length codewords and streamed over a fixed-rate channel, a queue forms whose per-token delay depends on the mean and variance of the bit lengths and on the coder's algorithmic latency. This paper investigates the compression--delay tradeoff that arises when a causal language model serves as the sequential predictor within a predict-then-code architecture for real-time text transmission. Several coding schemes are compared: Shannon (ideal), Huffman, arithmetic coding, rANS at various block sizes, and gzip. The analysis separates algorithmic delay, inherent to the coder, from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
