The Physics Behind ML-based Quark-Gluon Taggers
Sophia Vent, Ramon Winterhalder, Tilman Plehn

TL;DR
This paper explores the physics principles behind ML-based quark-gluon taggers, using explainability methods like feature importance and symbolic regression to understand and approximate their behavior.
Contribution
It introduces a physics-informed analysis of ML taggers, applying Shapley values and symbolic regression to interpret and simplify their decision-making process.
Findings
Identified key latent features correlating with physics observables.
Demonstrated limitations of standard Shapley values due to input correlations.
Derived compact formulas approximating the ML tagger outputs.
Abstract
Jet taggers provide an ideal testbed for applying explainability techniques to powerful ML tools. For theoretically and experimentally challenging quark-gluon tagging, we first identify the leading latent features that correlate strongly with physics observables, both in a linear and a non-linear approach. Next, we show how Shapley values can assess feature importance, although the standard implementation assumes independent inputs and can lead to distorted attributions in the presence of correlations. Finally, we use symbolic regression to derive compact formulas to approximate the tagger output.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
