From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits

Karim Saraipour; Shichang Zhang

arXiv:2508.16109·cs.CL·August 25, 2025

From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits

Karim Saraipour, Shichang Zhang

PDF

TL;DR

This paper investigates how GPT-2 small performs logical reasoning with syllogisms, revealing circuits and binary mechanisms that explain its ability to handle complex logical tasks, advancing mechanistic interpretability.

Contribution

It uncovers specific attention head circuits responsible for syllogistic reasoning and demonstrates high faithfulness in explaining GPT-2's logical capabilities.

Findings

01

Identified circuits with five attention heads explaining over 90% of performance

02

Discovered binary mechanisms enabling negation through negative heads

03

Linked syllogistic reasoning to prior IOI analysis

Abstract

Transformer-based language models (LMs) can perform a wide range of tasks, and mechanistic interpretability (MI) aims to reverse engineer the components responsible for task completion to understand their behavior. Previous MI research has focused on linguistic tasks such as Indirect Object Identification (IOI). In this paper, we investigate the ability of GPT-2 small to handle binary truth values by analyzing its behavior with syllogistic prompts, e.g., "Statement A is true. Statement B matches statement A. Statement B is", which requires more complex logical reasoning compared to IOI. Through our analysis of several syllogism tasks of varying difficulty, we identify multiple circuits that mechanistically explain GPT-2's logical-reasoning capabilities and uncover binary mechanisms that facilitate task completion, including the ability to produce a negated token not present in the input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.