What exactly did the Transformer learn from our physics data?

Martin Erdmann; Niklas Langner; Josina Schulte; Dominik Wirtz

arXiv:2505.21042·astro-ph.IM·April 14, 2026

What exactly did the Transformer learn from our physics data?

Martin Erdmann, Niklas Langner, Josina Schulte, Dominik Wirtz

PDF

TL;DR

This paper investigates what Transformer networks learn from physics data, specifically in cosmic ray simulations, revealing that they acquire physically meaningful features.

Contribution

It demonstrates that Transformers trained on physics data learn plausible, physically meaningful features, with analysis of positional encodings and attention values.

Findings

01

Transformers learn azimuthally symmetric features in air showers.

02

Attention values highlight particles from galaxy catalogs.

03

Transformers acquire physically meaningful representations.

Abstract

Transformer networks excel in scientific applications. We explore two scenarios in ultra-high-energy cosmic ray simulations to examine what these network architectures learn. First, we investigate the trained positional encodings in air showers which are azimuthally symmetric. Second, we visualize the attention values assigned to cosmic particles originating from a galaxy catalog. In both cases, the Transformers learn plausible, physically meaningful features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.