TL;DR
This paper introduces QRNCA, a new framework for identifying query-relevant neurons in large language models, enabling better understanding and editing of long-form text generation.
Contribution
The study presents a novel architecture-agnostic method for locating query-relevant neurons in LLMs, addressing long-form text and domain-specific knowledge regions.
Findings
QRNCA outperforms baseline methods in multi-choice QA tasks
Localized neuron regions are observable within different domains
Detected neurons can be used for knowledge editing and prediction
Abstract
Large Language Models (LLMs) possess vast amounts of knowledge within their parameters, prompting research into methods for locating and editing this knowledge. Previous work has largely focused on locating entity-related (often single-token) facts in smaller models. However, several key questions remain unanswered: (1) How can we effectively locate query-relevant neurons in decoder-only LLMs, such as Llama and Mistral? (2) How can we address the challenge of long-form (or free-form) text generation? (3) Are there localized knowledge regions in LLMs? In this study, we introduce Query-Relevant Neuron Cluster Attribution (QRNCA), a novel architecture-agnostic framework capable of identifying query-relevant neurons in LLMs. QRNCA allows for the examination of long-form answers beyond triplet facts by employing the proxy task of multi-choice question answering. To evaluate the effectiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
