Absorber LLM: Harnessing Causal Synchronization for Test-Time Training

Zhixin Zhang; Shabo Zhang; Chengcan Wu; Zeming Wei; Meng Sun

arXiv:2604.20915·cs.LG·April 24, 2026

Absorber LLM: Harnessing Causal Synchronization for Test-Time Training

Zhixin Zhang, Shabo Zhang, Chengcan Wu, Zeming Wei, Meng Sun

PDF

TL;DR

Absorber LLM introduces a self-supervised causal synchronization method that enables long-context retention in transformers, reducing memory use and improving accuracy in long-stream inference.

Contribution

It proposes a novel approach to long-context retention by absorbing historical context into model parameters through causal synchronization, addressing overfitting and memory issues.

Findings

01

Reduces inference memory in long-context tasks.

02

Improves accuracy over prior parameter-as-memory methods.

03

Effective on long-context and streaming benchmarks.

Abstract

Transformers suffer from a high computational cost that grows with sequence length for self-attention, making inference in long streams prohibited by memory consumption. Constant-memory alternatives such as RNNs and SSMs compress history into states with fixed size and thus lose long-tail dependencies, while methods that memorize contexts into parameters, such as Test-Time Training (TTT), are prone to overfitting token-level projection and fail to preserve the causal effect of context in pretrained LLMs. We propose Absorber LLM, which formulates long-context retention as a self-supervised causal synchronization: after absorbing historical contexts into parameters, a contextless model should match the original model with full context on future generations. We optimize this objective by synchronizing internal behaviors of the updated model with the original one, ensuring context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.