Manifold-Guided Attention Steering
Ian Li, Kapilesh Guruprasad, Raunak Sengupta, Ninad Satish, Loris D'Antoni, Rose Yu

TL;DR
Manifold-Guided Attention Steering (MAGS) improves reasoning accuracy in large language models by dynamically correcting attention head deviations from a learned correctness manifold during inference.
Contribution
MAGS introduces a trajectory-aware, geometric approach to attention steering that adapts corrections based on proximity to a correctness manifold, outperforming static methods.
Findings
MAGS outperforms unsteered and static steering baselines across multiple benchmarks.
Correctness manifolds are a general feature of LLM attention geometry.
Trajectory-aware corrections reduce error propagation in reasoning tasks.
Abstract
Large language models frequently produce errors in reasoning tasks despite possessing the underlying knowledge required for correct reasoning. One possible approach to improve reasoning consistency is through activation steering. However, existing activation steering approaches apply fixed, pre-computed correction vectors, ignoring where the model currently sits along its generation trajectory; the result is indiscriminate perturbation that disrupts already-correct steps as freely as erroneous ones. We propose Manifold-Guided Attention Steering (MAGS), a trajectory-aware inference-time intervention grounded in a geometric observation: the output activations of specific attention heads diverge from a low-dimensional correctness manifold at the point of error, and this deviation compounds through subsequent steps. For each identified attention head, we learn a low-dimensional subspace…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
