Training LLMs over Neurally Compressed Text

Brian Lester; Jaehoon Lee; Alex Alemi; Jeffrey Pennington; Adam; Roberts; Jascha Sohl-Dickstein; and Noah Constant

arXiv:2404.03626·cs.CL·December 16, 2024·1 cites

Training LLMs over Neurally Compressed Text

Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam, Roberts, Jascha Sohl-Dickstein, and Noah Constant

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel neural compression method called Equal-Info Windows that enables training large language models directly on highly compressed text, improving efficiency and inference speed despite some trade-offs in perplexity.

Contribution

The paper proposes a new compression technique for neural text compression that allows effective training of LLMs on neurally compressed data, outperforming byte-level baselines in speed and efficiency.

Findings

01

Effective learning over neurally compressed text demonstrated

02

Outperforms byte-level baselines in perplexity and inference speed

03

Shorter sequence lengths reduce latency and autoregressive steps

Abstract

In this paper, we explore the idea of training large language models (LLMs) over highly compressed text. While standard subword tokenizers compress text by a small factor, neural text compressors can achieve much higher rates of compression. If it were possible to train LLMs directly over neurally compressed text, this would confer advantages in training and serving efficiency, as well as easier handling of long text spans. The main obstacle to this goal is that strong compression tends to produce opaque outputs that are not well-suited for learning. In particular, we find that text na\"ively compressed via Arithmetic Coding is not readily learnable by LLMs. To overcome this, we propose Equal-Info Windows, a novel compression technique whereby text is segmented into blocks that each compress to the same bit length. Using this method, we demonstrate effective learning over neurally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Training LLMs over Neurally Compressed Text· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings