Mapping Machine-Learned Physics into a Human-Readable Space
Taylor Faucett, Jesse Thaler, Daniel Whiteson

TL;DR
This paper introduces a method to translate complex machine-learned classifiers into a small, interpretable set of physical observables, enhancing understanding and validation of the model's decisions in collider physics.
Contribution
The authors develop an iterative technique to identify human-readable observables that replicate a black-box classifier's decisions, demonstrated on jet classification in collider physics.
Findings
Successfully mapped a CNN to interpretable observables.
Identified overlooked physical observables improving classification.
Enhanced interpretability of machine learning models in physics.
Abstract
We present a technique for translating a black-box machine-learned classifier operating on a high-dimensional input space into a small set of human-interpretable observables that can be combined to make the same classification decisions. We iteratively select these observables from a large space of high-level discriminants by finding those with the highest decision similarity relative to the black box, quantified via a metric we introduce that evaluates the relative ordering of pairs of inputs. Successive iterations focus only on the subset of input pairs that are misordered by the current set of observables. This method enables simplification of the machine-learning strategy, interpretation of the results in terms of well-understood physical concepts, validation of the physical model, and the potential for new insights into the nature of the problem itself. As a demonstration, we apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
