Training LLMs over Neurally Compressed Text
Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam, Roberts, Jascha Sohl-Dickstein, and Noah Constant

TL;DR
This paper introduces a novel neural compression method called Equal-Info Windows that enables training large language models directly on highly compressed text, improving efficiency and inference speed despite some trade-offs in perplexity.
Contribution
The paper proposes a new compression technique for neural text compression that allows effective training of LLMs on neurally compressed data, outperforming byte-level baselines in speed and efficiency.
Findings
Effective learning over neurally compressed text demonstrated
Outperforms byte-level baselines in perplexity and inference speed
Shorter sequence lengths reduce latency and autoregressive steps
Abstract
In this paper, we explore the idea of training large language models (LLMs) over highly compressed text. While standard subword tokenizers compress text by a small factor, neural text compressors can achieve much higher rates of compression. If it were possible to train LLMs directly over neurally compressed text, this would confer advantages in training and serving efficiency, as well as easier handling of long text spans. The main obstacle to this goal is that strong compression tends to produce opaque outputs that are not well-suited for learning. In particular, we find that text na\"ively compressed via Arithmetic Coding is not readily learnable by LLMs. To overcome this, we propose Equal-Info Windows, a novel compression technique whereby text is segmented into blocks that each compress to the same bit length. Using this method, we demonstrate effective learning over neurally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
