Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation
Jingtao Wang, Yucong Wang, Jun Ding, Rui Cai, and Xun Wang

TL;DR
ARACH is a training-free, plug-in method that enhances large language models at inference time by reallocating attention internally, leading to consistent improvements without retraining or parameter updates.
Contribution
Introduces ARACH, a novel inference-time plug-in that reallocates attention within LLMs, offering a new internal intervention strategy distinct from prompt-based or training-based methods.
Findings
Consistent performance improvements across multiple tasks.
Modest inference overhead with no parameter updates.
Mitigates attention sink phenomenon in LLMs.
Abstract
Large language models (LLMs) achieve remarkable performance, yet further gains often require costly training. This has motivated growing interest in post-training techniques-especially training-free approaches that improve models at inference time without updating weights. Most training-free methods treat the model as a black box and improve outputs via input/output-level interventions, such as prompt design and test-time scaling through repeated sampling, reranking/verification, or search. In contrast, they rarely offer a plug-and-play mechanism to intervene in a model's internal computation. We propose ARACH(Attention Reallocation via an Adaptive Context Hub), a training-free inference-time plug-in that augments LLMs with an adaptive context hub to aggregate context and reallocate attention. Extensive experiments across multiple language modeling tasks show consistent improvements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
