Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows

Billy Dickson; Zoran Tiganj

arXiv:2510.22109·cs.CL·October 28, 2025

Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows

Billy Dickson, Zoran Tiganj

PDF

TL;DR

This paper proposes a simple input-level logarithmic compression method inspired by human memory to extend transformer context windows, improving language modeling performance on long texts without modifying the transformer architecture.

Contribution

It introduces a novel logarithmic compression technique for input representations that enables transformers to handle longer contexts without architectural changes.

Findings

01

Reduces perplexity on WikiText-103 and PG-19 benchmarks.

02

Performance improves with longer compressed contexts.

03

Maintains architectural simplicity while extending context length.

Abstract

Most approaches to long-context processing increase the complexity of the transformer's internal architecture by integrating mechanisms such as recurrence or auxiliary memory modules. In this work, we introduce an alternative approach that modifies the input representation itself, rather than the transformer architecture. Inspired by cognitive models of human memory, our method applies a scale-invariant logarithmic compression to the input tokens. The resulting compressed representation is processed by a standard, unmodified transformer, preserving architectural simplicity. We evaluate this approach on the WikiText-103 and PG-19 language modeling benchmarks, showing a reduction in perplexity compared to uncompressed baselines. Moreover, performance improves consistently with longer compressed temporal contexts, showing that input-level logarithmic compression is a simple and effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.