Look everywhere effects in anomaly detection
Marie Hein, Benjamin Nachman, David Shih

TL;DR
This paper investigates the look elsewhere effect in machine learning-based anomaly detection, highlighting calibration challenges and proposing k-folding as a balanced solution, supported by numerical and collider physics studies.
Contribution
It provides a detailed analysis of calibration issues in anomaly detection and evaluates methods like data splitting and k-folding to optimize sensitivity and statistical validity.
Findings
Training and testing on the same data leads to miscalibrated p-values.
Calibrating p-values improves sensitivity but reduces calibration.
K-folding offers a good balance between calibration and sensitivity.
Abstract
Machine learning-based anomaly detection methods are able to search high-dimensional spaces for hints of new physics with much less theory bias than traditional searches. However, by searching in many directions all at once, the statistical power of these search strategies is diluted by a variant of the look elsewhere effect. We examine this challenge in detail, focusing on weakly supervised methods. We find that training and testing on the same data results in badly miscalibrated -values due to the anomaly detector searching everywhere in the data and overfitting on statistical fluctuations. However, if these -values can be calibrated, they may offer the best sensitivity to anomalies, since this approach uses all of the data. Conversely, training on half of the data and testing on the other half results in perfectly calibrated -values, but at the cost of reduced sensitivity to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParticle physics theoretical and experimental studies · Computational Physics and Python Applications · Anomaly Detection Techniques and Applications
