Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation

Jingtao Wang; Yucong Wang; Jun Ding; Rui Cai; and Xun Wang

arXiv:2603.11067·cs.CL·March 13, 2026

Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation

Jingtao Wang, Yucong Wang, Jun Ding, Rui Cai, and Xun Wang

PDF

Open Access

TL;DR

ARACH is a training-free, plug-in method that enhances large language models at inference time by reallocating attention internally, leading to consistent improvements without retraining or parameter updates.

Contribution

Introduces ARACH, a novel inference-time plug-in that reallocates attention within LLMs, offering a new internal intervention strategy distinct from prompt-based or training-based methods.

Findings

01

Consistent performance improvements across multiple tasks.

02

Modest inference overhead with no parameter updates.

03

Mitigates attention sink phenomenon in LLMs.

Abstract

Large language models (LLMs) achieve remarkable performance, yet further gains often require costly training. This has motivated growing interest in post-training techniques-especially training-free approaches that improve models at inference time without updating weights. Most training-free methods treat the model as a black box and improve outputs via input/output-level interventions, such as prompt design and test-time scaling through repeated sampling, reranking/verification, or search. In contrast, they rarely offer a plug-and-play mechanism to intervene in a model's internal computation. We propose ARACH(Attention Reallocation via an Adaptive Context Hub), a training-free inference-time plug-in that augments LLMs with an adaptive context hub to aggregate context and reallocate attention. Extensive experiments across multiple language modeling tasks show consistent improvements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications