Revisiting In-context Learning Inference Circuit in Large Language Models
Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue

TL;DR
This paper introduces a comprehensive inference circuit model for understanding in-context learning in large language models, explaining various observed phenomena and highlighting the circuit's importance through ablation studies.
Contribution
It proposes a detailed inference circuit that models ICL dynamics, unifies fragmented phenomena, and demonstrates the circuit's critical role via ablation analysis.
Findings
The inference circuit captures many ICL phenomena.
Ablation of circuit steps reduces ICL performance.
Parallel bypass mechanisms also exist for ICL.
Abstract
In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
