CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles

Swapnil Parekh

arXiv:2603.00523·cs.CL·March 23, 2026

CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles

Swapnil Parekh

PDF

Open Access

TL;DR

CIRCUS introduces a method to identify robust circuit structures by analyzing multiple configurations, effectively distinguishing true features from threshold artifacts with minimal computational overhead.

Contribution

It presents a novel ensemble-based approach for circuit discovery that quantifies edge robustness and extracts a consensus circuit, improving interpretability and reliability.

Findings

01

Consensus circuits are 40x smaller than union of configurations.

02

Retain comparable explanatory power to larger unions.

03

Outperform influence-ranked and random baselines.

Abstract

Every mechanistic circuit carries an invisible asterisk: it reflects not just the model's computation, but the analyst's choice of pruning threshold. Change that choice and the circuit changes, yet current practice treats a single pruned subgraph as ground truth with no way to distinguish robust structure from threshold artifacts. We introduce CIRCUS, which reframes circuit discovery as a problem of uncertainty over explanations. CIRCUS prunes one attribution graph under B configurations, assigns each edge an empirical inclusion frequency s(e) in [0,1] measuring how robustly it survives across the configuration family, and extracts a consensus circuit of edges present in every view. This yields a principled core/contingent/noise decomposition (analogous to posterior model-inclusion indicators in Bayesian variable selection) that separates robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Adversarial Robustness in Machine Learning · Machine Learning in Materials Science