The Physics Behind ML-based Quark-Gluon Taggers

Sophia Vent; Ramon Winterhalder; Tilman Plehn

arXiv:2507.21214·hep-ph·April 9, 2026

The Physics Behind ML-based Quark-Gluon Taggers

Sophia Vent, Ramon Winterhalder, Tilman Plehn

PDF

TL;DR

This paper explores the physics principles behind ML-based quark-gluon taggers, using explainability methods like feature importance and symbolic regression to understand and approximate their behavior.

Contribution

It introduces a physics-informed analysis of ML taggers, applying Shapley values and symbolic regression to interpret and simplify their decision-making process.

Findings

01

Identified key latent features correlating with physics observables.

02

Demonstrated limitations of standard Shapley values due to input correlations.

03

Derived compact formulas approximating the ML tagger outputs.

Abstract

Jet taggers provide an ideal testbed for applying explainability techniques to powerful ML tools. For theoretically and experimentally challenging quark-gluon tagging, we first identify the leading latent features that correlate strongly with physics observables, both in a linear and a non-linear approach. Next, we show how Shapley values can assess feature importance, although the standard implementation assumes independent inputs and can lead to distorted attributions in the presence of correlations. Finally, we use symbolic regression to derive compact formulas to approximate the tagger output.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.