Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Xu Wang, Yan Hu, Wenyu Du, Reynold Cheng, Benyou Wang, Difan Zou

TL;DR
This paper uses circuit analysis to interpret the fine-tuning process of LLMs, revealing circuit dynamics and developing a circuit-aware LoRA method that improves performance and offers insights into task composition.
Contribution
It introduces a circuit analysis approach to understand fine-tuning mechanisms and proposes a circuit-aware LoRA method that outperforms standard LoRA.
Findings
Circuits maintain high node similarity before and after fine-tuning
Edges in circuits undergo significant changes during fine-tuning
Circuit-based LoRA improves performance by 2.46% over standard LoRA
Abstract
Fine-tuning significantly improves the performance of Large Language Models (LLMs), yet its underlying mechanisms remain poorly understood. This paper aims to provide an in-depth interpretation of the fine-tuning process through circuit analysis, a popular tool in Mechanistic Interpretability (MI). Unlike previous studies (Prakash et al. 2024; Chhabra et al. 2024) that focus on tasks where pre-trained models already perform well, we develop a set of mathematical tasks where fine-tuning yields substantial performance gains, which are closer to the practical setting. In our experiments, we identify circuits at various checkpoints during fine-tuning and examine the interplay between circuit analysis, fine-tuning methods, and task complexities. First, we find that while circuits maintain high node similarity before and after fine-tuning, their edges undergo significant changes, in contrast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvancements in Semiconductor Devices and Circuit Design · Semiconductor materials and devices
MethodsFocus · Sparse Evolutionary Training
