TL;DR
This paper introduces an unsupervised learning method using a Nearest Neighbors approach to detect discrepancies between datasets, aiding the search for new physics phenomena in high-dimensional data without relying on specific models.
Contribution
It presents a novel, model-independent statistical test for identifying differences between datasets, applicable to complex, multidimensional data in physics research.
Findings
Effective in synthetic Gaussian data analysis
Successfully applied to simulated dark matter signals
Identifies regions of interest even with imperfect background models
Abstract
We propose a new scientific application of unsupervised learning techniques to boost our ability to search for new phenomena in data, by detecting discrepancies between two datasets. These could be, for example, a simulated standard-model background, and an observed dataset containing a potential hidden signal of New Physics. We build a statistical test upon a test statistic which measures deviations between two samples, using a Nearest Neighbors approach to estimate the local ratio of the density of points. The test is model-independent and non-parametric, requiring no knowledge of the shape of the underlying distributions, and it does not bin the data, thus retaining full information from the multidimensional feature space. As a proof-of-concept, we apply our method to synthetic Gaussian data, and to a simulated dark matter signal at the Large Hadron Collider. Even in the case where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
