Position-aware Automatic Circuit Discovery
Tal Haklay, Hadas Orgad, David Bau, Aaron Mueller, Yonatan Belinkov

TL;DR
This paper introduces a position-aware approach to circuit discovery in language models, enhancing the ability to identify mechanisms that vary across input positions, especially in variable-length datasets.
Contribution
It extends existing gradient-based circuit discovery methods to incorporate positional information and introduces a dataset schema concept for variable-length examples.
Findings
Improved circuit discovery with better size-faithfulness trade-offs.
Automated pipeline for schema generation using large language models.
Enhanced detection of position-sensitive mechanisms in language models.
Abstract
A widely used strategy to discover and understand language model mechanisms is circuit analysis. A circuit is a minimal subgraph of a model's computation graph that executes a specific task. We identify a gap in existing circuit discovery methods: they assume circuits are position-invariant, treating model components as equally relevant across input positions. This limits their ability to capture cross-positional interactions or mechanisms that vary across positions. To address this gap, we propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. First, we extend edge attribution patching, a gradient-based method for circuit discovery, to differentiate between token positions. Second, we introduce the concept of a dataset schema, which defines token spans with similar semantics across examples, enabling position-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Database Systems and Queries · Algorithms and Data Compression · Logic, programming, and type systems
