Neuro-RIT: Neuron-Guided Instruction Tuning for Robust Retrieval-Augmented Language Model
Jaemin Kim, Jae O Lee, Sumyeong Ahn, Seo Yeon Park

TL;DR
Neuro-RIT introduces neuron-level alignment and deactivation techniques to improve retrieval-augmented language models' robustness against irrelevant or noisy contexts.
Contribution
It presents a neuron-guided instruction tuning framework that explicitly disentangles and deactivates neurons responsible for irrelevant information, enhancing model robustness.
Findings
Neuro-RIT outperforms baseline methods on multiple QA benchmarks.
Neuron deactivation improves noise robustness without sacrificing accuracy.
Two-stage tuning effectively balances relevance and evidence distillation.
Abstract
Retrieval-Augmented Language Models (RALMs) have demonstrated significant potential in knowledge-intensive tasks; however, they remain vulnerable to performance degradation when presented with irrelevant or noisy retrieved contexts. Existing approaches to enhance robustness typically operate via coarse-grained parameter updates at the layer or module level, often overlooking the inherent neuron-level sparsity of Large Language Models (LLMs). To address this limitation, we propose Neuro-RIT (Neuron-guided Robust Instruction Tuning), a novel framework that shifts the paradigm from dense adaptation to precision-driven neuron alignment. Our method explicitly disentangles neurons that are responsible for processing relevant versus irrelevant contexts using attribution-based neuron mining. Subsequently, we introduce a two-stage instruction tuning strategy that enforces a dual capability for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
