TL;DR
SinkTrack is a novel, training-free method that uses attention sink as an intrinsic anchor to improve context retention and reduce hallucinations in large language models during generation.
Contribution
It introduces SinkTrack, an advanced context anchoring technique leveraging attention sink, which is simple, effective, and applicable across various LLM architectures and scales.
Findings
Mitigates hallucination and context forgetting in LLMs.
Improves performance on textual and multi-modal tasks with significant gains.
Demonstrates robustness and generalizability across different models.
Abstract
Large language models (LLMs) suffer from hallucination and context forgetting. Prior studies suggest that attention drift is a primary cause of these problems, where LLMs' focus shifts towards newly generated tokens and away from the initial input context. To counteract this, we make use of a related, intrinsic characteristic of LLMs: attention sink -- the tendency to consistently allocate high attention to the very first token (i.e., <BOS>) of a sequence. Concretely, we propose an advanced context anchoring method, SinkTrack, which treats <BOS> as an information anchor and injects key contextual features (such as those derived from the input image or instruction) into its representation. As such, LLM remains anchored to the initial input context throughout the entire generation process. SinkTrack is training-free, plug-and-play, and introduces negligible inference overhead. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
