Enhancing next token prediction based pre-training for jet foundation models
Joschka Birk, Anna Hallin, Gregor Kasieczka, Nikol Madzharova, Ian Pang, David Shih

TL;DR
This paper introduces improvements to next token prediction pre-training for jet foundation models, utilizing a hybrid input approach and combined objectives to enhance downstream classification without sacrificing generative abilities.
Contribution
It proposes a hybrid input setup and combined pre-training strategies that significantly boost downstream classification performance in jet models.
Findings
Enhanced downstream classification accuracy
Maintained generative performance
Effective hybrid input and combined objectives
Abstract
Next token prediction is an attractive pre-training task for jet foundation models, in that it is simulation free and enables excellent generative capabilities that can transfer across datasets. Here we study multiple improvements to next token prediction, building on the initial work of OmniJet-. Instead of tokenizing particles and subsequently only using the token-ID as the model input for both the generative and the classification task, we adopt a hybrid setup, which allows us to use continuous feature vectors as model input while only using token-IDs in the next token prediction target. Secondly, we explore a combined pre-training strategy that combines masked particle modeling and generative learning objectives. Taken together, these changes greatly improve the performance in downstream classification tasks without any loss in generative performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Machine Learning in Materials Science
