Iteration Head: A Mechanistic Study of Chain-of-Thought

Vivien Cabannes; Charles Arnal; Wassim Bouaziz; Alice Yang; Francois; Charton; Julia Kempe

arXiv:2406.02128·cs.LG·October 29, 2024·2 cites

Iteration Head: A Mechanistic Study of Chain-of-Thought

Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Alice Yang, Francois, Charton, Julia Kempe

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how Chain-of-Thought reasoning emerges in transformers, revealing a specialized attention mechanism called 'iteration heads' that facilitate iterative reasoning and transferability across tasks.

Contribution

It provides a mechanistic understanding of CoT emergence in transformers, identifying iteration heads as a key component and analyzing their behavior and transferability.

Findings

01

Emergence of iteration heads in transformer models.

02

Iteration heads are specialized attention mechanisms for iterative reasoning.

03

Transferability of CoT skills between tasks is demonstrated.

Abstract

Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particular, we observe the appearance of a specialized attention mechanism dedicated to iterative reasoning, which we coined "iteration heads". We track both the emergence and the precise working of these iteration heads down to the attention level, and measure the transferability of the CoT skills to which they give rise between tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/pal
pytorchOfficial

Videos

Iteration Head: A Mechanistic Study of Chain-of-Thought· slideslive

Taxonomy

TopicsAdvanced Graph Neural Networks · Ferroelectric and Negative Capacitance Devices · Topic Modeling