Mechanistic Analysis of Circuit Preservation in Federated Learning
Muhammad Haseeb, Salaar Masood, Muhammad Abdullah Sohail

TL;DR
This paper uses mechanistic interpretability to analyze how non-IID data causes circuit degradation in federated learning, revealing that conflicting client updates lead to the collapse of class-specific sub-networks.
Contribution
It introduces a mechanistic interpretability approach to diagnose circuit collapse in federated learning under non-IID data conditions, providing the first concrete evidence of circuit divergence.
Findings
Non-IID data causes local circuits to diverge.
Circuit collapse correlates with performance degradation.
Mechanistic analysis offers new insights into FL failures.
Abstract
Federated Learning (FL) enables collaborative training of models on decentralized data, but its performance degrades significantly under Non-IID (non-independent and identically distributed) data conditions. While this accuracy loss is well-documented, the internal mechanistic causes remain a black box. This paper investigates the canonical FedAvg algorithm through the lens of Mechanistic Interpretability (MI) to diagnose this failure mode. We hypothesize that the aggregation of conflicting client updates leads to circuit collapse, the destructive interference of functional, sparse sub-networks responsible for specific class predictions. By training inherently interpretable, weight-sparse neural networks within an FL framework, we identify and track these circuits across clients and communication rounds. Using Intersection-over-Union (IoU) to quantify circuit preservation, we provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Privacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
