FAIR Universe HiggsML Uncertainty Dataset and Competition
Lisa Benato, Wahid Bhimji, Paolo Calafiura, Ragansu Chakkappai, Po-Wen Chang, Yuan-Tang Chou, Sascha Diefenbacher, Jordan Dudley, Ibrahim Elsharkawy, Steven Farrell, Aishik Ghosh, Cristina Giordano, Isabelle Guyon, Chris Harris, Yota Hashizume, Shih-Chieh Hsu, Elham E. Khoda

TL;DR
The paper presents the FAIR Universe HiggsML Uncertainty Dataset and Competition, which aims to improve the measurement of Higgs boson properties by benchmarking machine learning methods on a large simulated dataset with systematic uncertainties.
Contribution
It introduces a large, publicly available dataset and challenge for benchmarking ML techniques in particle physics uncertainty quantification, highlighting novel approaches like Contrastive Normalising Flows.
Findings
Top methods include Contrastive Normalising Flows and Density Ratios estimation.
The dataset enables long-term benchmarking of uncertainty estimation techniques.
The challenge fosters collaboration between physics and machine learning communities.
Abstract
The FAIR Universe HiggsML Uncertainty Challenge focused on measuring the physical properties of elementary particles with imperfect simulators. Participants were required to compute and report confidence intervals for a parameter of interest regarding the Higgs boson while accounting for various systematic (epistemic) uncertainties. The dataset is a tabular dataset of 28 features and 280 million instances. Each instance represents a simulated proton-proton collision as observed at CERN's Large Hadron Collider in Geneva, Switzerland. The features of these simulations were chosen to capture key characteristics of different types of particles. These include primary attributes, such as the energy and three-dimensional momentum of the particles, as well as derived attributes, which are calculated from the primary ones using domain-specific knowledge. Additionally, a label feature designates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsResearch Data Management Practices · Scientific Computing and Data Management · Big Data and Business Intelligence
