Revisiting In-context Learning Inference Circuit in Large Language   Models

Hakaze Cho; Mariko Kato; Yoshihiro Sakai; Naoya Inoue

arXiv:2410.04468·cs.CL·February 21, 2025

Revisiting In-context Learning Inference Circuit in Large Language Models

Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue

PDF

Open Access 1 Repo

TL;DR

This paper introduces a comprehensive inference circuit model for understanding in-context learning in large language models, explaining various observed phenomena and highlighting the circuit's importance through ablation studies.

Contribution

It proposes a detailed inference circuit that models ICL dynamics, unifies fragmented phenomena, and demonstrates the circuit's critical role via ablation analysis.

Findings

01

The inference circuit captures many ICL phenomena.

02

Ablation of circuit steps reduces ICL performance.

03

Parallel bypass mechanisms also exist for ICL.

Abstract

In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hc495/ICL_Circuit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling