Position-aware Automatic Circuit Discovery

Tal Haklay; Hadas Orgad; David Bau; Aaron Mueller; Yonatan Belinkov

arXiv:2502.04577·cs.LG·February 10, 2025

Position-aware Automatic Circuit Discovery

Tal Haklay, Hadas Orgad, David Bau, Aaron Mueller, Yonatan Belinkov

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a position-aware approach to circuit discovery in language models, enhancing the ability to identify mechanisms that vary across input positions, especially in variable-length datasets.

Contribution

It extends existing gradient-based circuit discovery methods to incorporate positional information and introduces a dataset schema concept for variable-length examples.

Findings

01

Improved circuit discovery with better size-faithfulness trade-offs.

02

Automated pipeline for schema generation using large language models.

03

Enhanced detection of position-sensitive mechanisms in language models.

Abstract

A widely used strategy to discover and understand language model mechanisms is circuit analysis. A circuit is a minimal subgraph of a model's computation graph that executes a specific task. We identify a gap in existing circuit discovery methods: they assume circuits are position-invariant, treating model components as equally relevant across input positions. This limits their ability to capture cross-positional interactions or mechanisms that vary across positions. To address this gap, we propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. First, we extend edge attribution patching, a gradient-based method for circuit discovery, to differentiate between token positions. Second, we introduce the concept of a dataset schema, which defines token spans with similar semantics across examples, enabling position-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

technion-cs-nlp/peap
jaxOfficial

Videos

Position-aware Automatic Circuit Discovery· underline

Taxonomy

TopicsAdvanced Database Systems and Queries · Algorithms and Data Compression · Logic, programming, and type systems