Stochastic Parameter Decomposition
Lucius Bushnaq, Dan Braun, Lee Sharkey

TL;DR
This paper introduces Stochastic Parameter Decomposition (SPD), a scalable and robust method for decomposing neural network parameters into simpler parts, improving upon existing methods like APD in terms of efficiency and accuracy.
Contribution
The paper presents SPD, a novel scalable and hyperparameter-robust decomposition method that enables analysis of larger and more complex neural networks, bridging causal analysis and network interpretability.
Findings
SPD is more scalable than APD for larger models
SPD is more robust to hyperparameters
SPD better identifies ground truth mechanisms in toy models
Abstract
A key step in reverse engineering neural networks is to decompose them into simpler parts that can be studied in relative isolation. Linear parameter decomposition -- a framework that has been proposed to resolve several issues with current decomposition methods -- decomposes neural network parameters into a sum of sparsely used vectors in parameter space. However, the current main method in this framework, Attribution-based Parameter Decomposition (APD), is impractical on account of its computational cost and sensitivity to hyperparameters. In this work, we introduce \textit{Stochastic Parameter Decomposition} (SPD), a method that is more scalable and robust to hyperparameters than APD, which we demonstrate by decomposing models that are slightly larger and more complex than was possible to decompose with APD. We also show that SPD avoids other issues, such as shrinkage of the learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbabilistic and Robust Engineering Design · Simulation Techniques and Applications
