Can Input Attributions Explain Inductive Reasoning in In-Context Learning?
Mengyu Ye, Tatsuki Kuribayashi, Goro Kobayashi, Jun Suzuki

TL;DR
This paper investigates whether input attribution methods can explain how large language models perform inductive reasoning in in-context learning, using synthetic tasks inspired by psycholinguistics to analyze interpretability challenges.
Contribution
The study introduces synthetic diagnostic tasks for inductive reasoning and evaluates input attribution methods, revealing their effectiveness and limitations in interpreting ICL in LLMs.
Findings
A simple input attribution method performs best.
Larger models are harder to interpret with gradient-based IA methods.
Certain IA methods can identify influential examples in ICL.
Abstract
Interpreting the internal process of neural models has long been a challenge. This challenge remains relevant in the era of large language models (LLMs) and in-context learning (ICL); for example, ICL poses a new issue of interpreting which example in the few-shot examples contributed to identifying/solving the task. To this end, in this paper, we design synthetic diagnostic tasks of inductive reasoning, inspired by the generalization tests typically adopted in psycholinguistics. Here, most in-context examples are ambiguous w.r.t. their underlying rule, and one critical example disambiguates it. The question is whether conventional input attribution (IA) methods can track such a reasoning process, i.e., identify the influential example, in ICL. Our experiments provide several practical findings; for example, a certain simple IA method works the best, and the larger the model, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
MethodsSparse Evolutionary Training
