What exactly did the Transformer learn from our physics data?
Martin Erdmann, Niklas Langner, Josina Schulte, Dominik Wirtz

TL;DR
This paper investigates what Transformer networks learn from physics data, specifically in cosmic ray simulations, revealing that they acquire physically meaningful features.
Contribution
It demonstrates that Transformers trained on physics data learn plausible, physically meaningful features, with analysis of positional encodings and attention values.
Findings
Transformers learn azimuthally symmetric features in air showers.
Attention values highlight particles from galaxy catalogs.
Transformers acquire physically meaningful representations.
Abstract
Transformer networks excel in scientific applications. We explore two scenarios in ultra-high-energy cosmic ray simulations to examine what these network architectures learn. First, we investigate the trained positional encodings in air showers which are azimuthally symmetric. Second, we visualize the attention values assigned to cosmic particles originating from a galaxy catalog. In both cases, the Transformers learn plausible, physically meaningful features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
