Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models

Philipp Mondorf; Sondre Wold; Barbara Plank

arXiv:2410.01434·cs.LG·June 24, 2025

Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models

Philipp Mondorf, Sondre Wold, Barbara Plank

PDF

Open Access

TL;DR

This paper investigates the modular structure of transformer-based language models by identifying and analyzing circuits for compositional subtasks, revealing their reusability and potential for representing complex functions.

Contribution

It introduces a method to identify and compare circuits for modular subtasks, demonstrating their overlap, faithfulness, and compositional reuse within language models.

Findings

01

Circuits for similar tasks show significant node overlap.

02

Identified circuits are faithful to task behavior.

03

Circuits can be combined to model complex functions.

Abstract

A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions through subnetworks that can be composed to perform more complex tasks. Recent advances in mechanistic interpretability have made progress in identifying $circuits$ , which represent the minimal computational subgraphs responsible for a model's behavior on specific tasks. However, most studies focus on identifying circuits for individual tasks without investigating how functionally similar circuits $relate$ to each other. To address this gap, we study the modularity of neural networks by analyzing circuits for highly compositional subtasks within a transformer-based language model. Specifically, given a probabilistic context-free grammar, we identify and compare circuits responsible for ten modular string-edit operations. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques

MethodsSparse Evolutionary Training · Focus