Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh, Samuel Klein, Fran\c{c}ois Charton, Tobias Golling,, Lukas Heinrich, Michael Kagan, In\^es Ochoa, Margarita Osadchy

TL;DR
This paper advances masked particle modeling by introducing new reconstruction methods that outperform tokenized approaches, enhancing the development of foundation models for high-energy physics without data discretization.
Contribution
It presents novel reconstruction techniques using conditional generative models that improve masked particle modeling performance in high-energy physics applications.
Findings
New methods outperform tokenized MPM on jet physics tasks
Enhanced decoder and implementation efficiencies improve model performance
Applicable to various downstream tasks like classification and vertex finding
Abstract
In this work, we significantly enhance masked particle modeling (MPM), a self-supervised learning scheme for constructing highly expressive representations of unordered sets relevant to developing foundation models for high-energy physics. In MPM, a model is trained to recover the missing elements of a set, a learning objective that requires no labels and can be applied directly to experimental data. We achieve significant performance improvements over previous work on MPM by addressing inefficiencies in the implementation and incorporating a more powerful decoder. We compare several pre-training tasks and introduce new reconstruction methods that utilize conditional generative models without data tokenization or discretization. We show that these new methods outperform the tokenized learning objective from the original MPM on a new test bed for foundation models for jets, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectron and X-Ray Spectroscopy Techniques · Enhanced Oil Recovery Techniques · Granular flow and fluidized beds
