IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck
Tian Bian, Yifan Niu, Chaohao Yuan, Chengzhi Piao, Bingzhe Wu, Long-Kai Huang, Yu Rong, Tingyang Xu, Hong Cheng, Jia Li

TL;DR
This paper introduces IBCircuit, an end-to-end method based on the Information Bottleneck principle, for holistic discovery of informative circuits within language models that explain task-specific behaviors more accurately and efficiently.
Contribution
It presents a novel optimization framework that identifies minimal and faithful circuits without task-specific corrupted activation design, improving over prior methods.
Findings
IBCircuit finds more faithful circuits in IOI and Greater-Than tasks.
It identifies minimal circuits with critical nodes and edges.
The method is applicable to any task without task-specific modifications.
Abstract
Circuit discovery has recently attracted attention as a potential research direction to explain the non-trivial behaviors of language models. It aims to find the computational subgraphs, also known as circuits, within the model that are responsible for solving specific tasks. However, most existing studies overlook the holistic nature of these circuits and require designing specific corrupted activations for different tasks, which is inaccurate and inefficient. In this work, we propose an end-to-end approach based on the principle of Information Bottleneck, called IBCircuit, to identify informative circuits holistically. IBCircuit is an optimization framework for holistic circuit discovery and can be applied to any given task without tediously corrupted activation design. In both the Indirect Object Identification (IOI) and Greater-Than tasks, IBCircuit identifies more faithful and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Machine Learning in Materials Science
