Machine learning-powered data cleaning for LEGEND: a semi-supervised approach using affinity propagation and support vector machines
E. Le\'on, A. Li, M. A. Bahena Schott, B. Bos, M. Busch, J. R., Chapman, G. L. Duran, J. Gruszko, R. Henning, E. L. Martin, J. F. Wilkerson

TL;DR
This paper introduces a semi-supervised AI approach combining affinity propagation and support vector machines to efficiently clean data by removing non-physical events in the LEGEND experiment, enhancing detection sensitivity.
Contribution
The study presents a novel semi-supervised data cleaning method using affinity propagation and SVMs specifically tailored for LEGEND's neutrino detection data.
Findings
Model achieves only 0.024% sacrifice of physics events.
Effective clustering of waveform signals based on shape.
Improves data cleaning procedures for large-scale neutrino experiments.
Abstract
Neutrinoless double-beta decay () is a rare nuclear process that, if observed, will provide insight into the nature of neutrinos and help explain the matter-antimatter asymmetry in the universe. The Large Enriched Germanium Experiment for Neutrinoless Double-Beta Decay (LEGEND) will operate in two phases to search for . The first (second) stage will employ 200 (1000) kg of High-Purity Germanium (HPGe) enriched in Ge to achieve a half-life sensitivity of 10 (10) years. In this study, we present a semi-supervised data-driven approach to remove non-physical events captured by HPGe detectors powered by a novel artificial intelligence model. We utilize Affinity Propagation to cluster waveform signals based on their shape and a Support Vector Machine to classify them into different categories. We train, optimize, test our model on data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Big Data and Business Intelligence · Privacy-Preserving Technologies in Data
