Fine-Tuning is Subgraph Search: A New Lens on Learning Dynamics
Yueyan Li, Wenhao Gao, Caixia Yuan, Xiaojie Wang

TL;DR
This paper introduces a novel fine-tuning method called circuit-tuning that models learning as subgraph search within a neural network, offering insights into learning dynamics and improving task-specific performance.
Contribution
It proposes a new analytical framework for understanding fine-tuning as subgraph search, validated through experiments and analysis of learning dynamics.
Findings
Circuit-tuning balances task performance and general capabilities.
Learning dynamics can be understood through subgraph optimization.
The method offers new insights into neural network training mechanisms.
Abstract
The study of mechanistic interpretability aims to reverse-engineer a model to explain its behaviors. While recent studies have focused on the static mechanism of a certain behavior, the learning dynamics inside a model remain to be explored. In this work, we develop a fine-tuning method for analyzing the mechanism behind learning. Inspired by the concept of intrinsic dimension, we view a model as a computational graph with redundancy for a specific task, and treat the fine-tuning process as a search for and optimization of a subgraph within this graph. Based on this hypothesis, we propose circuit-tuning, an algorithm that iteratively builds the subgraph for a specific task and updates the relevant parameters in a heuristic way. We first validate our hypothesis through a carefully designed experiment and provide a detailed analysis of the learning dynamics during fine-tuning.…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
Section 4, i.e., the main findings, is really thorough and well done (even if a lot of back and forth between appendix and main paper is needed). I can easily see this analysis promoting further work down the line for using fine-tuning to do hypothesis testing for circuits.
- Relation to prior work: My main apprehension with the paper is how it's contextualized with respect to the broader literature. Specifically, similar approaches as this paper were proposed in work like [1] to iteratively add a task to a model by identifying a parameter mask that maximally aids learning of a task without interfering with other learned tasks. The goal here was continual learning and the precise domain / model class is different, but subsequent papers from this group generalizes t
* The direction analyzed by the authors seems quite interesting. I really liked their approach on using circuits to investigate fine-tuning. * The results in section E.4.3 are quite interesting. However, it would strengthen the paper significantly if they organized properly in the main paper.
* I think the paper could significantly be improved in terms of writing (especially Sec. 4.4). It is hard to follow Sec. 4.4 and their circuit pruning algorithm. The current draft is focussed more on proposing an algorithm which could be used instead of fine-tuning to achieve improved results. However, I think the empirical evidence (Table-1) is not very convincing as the gains are quite minimal. Instead, if the authors would focus more on analyzing how the circuits change over the course of tra
1. Fine-tuning large models efficiently is pertinent to today's research landspace. This paper offers a fresh perspective on how that can be done based on several existing ideas upon which the proposed method is built, such as intrinsic dimensionality and circuit analysis. Packaging fine-tuning with mechanistic interpretability is a novel propostion, with promising initial insights that can guide further research. 2. A relatively simple but effective task is designed to validate the method, wh
1. top-N is a task-relevant hyperparameter that is likely expensive to tune. While limitations of the current study are discussed in the appendix, I think there are no major weaknesses at this stage.
The proposed method is simple and elegant. The finding that restricting fine-tuning to a relevant circuit can provide utility is interesting, and may be novel (I've spoken to several researchers who intended to work on something like this about 2 years ago, but I didn’t find any works that obviously actually did this in their Google scholar, so I’m not sure). Indeed, the results in Table 1 appear quite strong, and this is noteworthy and potentially enough to merit publication absent other issu
The clarity and quality of presentation is low overall, and includes various imprecise claims, some of which seem false. From what I can tell, the analysis in Section 4 fails to make any use of the Cs that are discovered throughout training, and to only perform static analyses on the model before and after fine-tuning. So while the method could be used to study the dynamics of learning, as promised elsewhere, it seems the submission never actually does that. This invalidates much of the motiv
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSensor Technology and Measurement Systems · Neural Networks and Applications
