Particle Trajectory Representation Learning with Masked Point Modeling

Sam Young; Yeon-jae Jwa; Kazuhiro Terao

arXiv:2502.02558·hep-ex·March 12, 2026

Particle Trajectory Representation Learning with Masked Point Modeling

Sam Young, Yeon-jae Jwa, Kazuhiro Terao

PDF

1 Datasets

TL;DR

This paper introduces PoLAr-MAE, a self-supervised learning method for LArTPC data that learns meaningful particle trajectories with high data efficiency, reducing the need for extensive labeled datasets.

Contribution

The paper presents a novel masked point modeling approach tailored for LArTPC images, achieving high performance with minimal labeled data and demonstrating emergent instance segmentation capabilities.

Findings

01

Achieves state-of-the-art segmentation with only 100 labeled events

02

Learns physically meaningful trajectory representations from unlabeled data

03

Releases a large dataset to facilitate further research

Abstract

Effective self-supervised learning (SSL) techniques have been key to unlocking large datasets for representation learning. While many promising methods have been developed using online corpora and captioned photographs, their application to scientific domains, where data encodes highly specialized knowledge, remains a challenge. Liquid Argon Time Projection Chambers (LArTPCs) provide high-resolution 3D imaging for fundamental physics, but analysis of their sparse, complex point cloud data often relies on supervised methods trained on large simulations, introducing potential biases. We introduce the Point-based Liquid Argon Masked Autoencoder (PoLAr-MAE), applying masked point modeling to unlabeled LArTPC images using domain-specific volumetric tokenization and energy prediction. We show this SSL approach learns physically meaningful trajectory representations directly from data. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

DeepLearnPhysics/PILArNet-M
dataset· 27 dl
27 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.