Resonant Anomaly Detection with Multiple Reference Datasets
Mayee F. Chen, Benjamin Nachman, Frederic Sala

TL;DR
This paper extends resonant anomaly detection techniques to utilize multiple reference datasets, enhancing detection performance and providing finite-sample guarantees, thus advancing high energy physics data analysis methods.
Contribution
It introduces generalized methods based on weak supervision that leverage multiple reference datasets, improving anomaly detection and offering finite-sample guarantees.
Findings
Improved detection performance with multiple datasets
Enhanced theoretical guarantees for finite samples
Validated on realistic and synthetic data
Abstract
An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal. Such techniques, including Classification Without Labels (CWoLa) and Simulation Assisted Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset. They cannot take advantage of commonly-available multiple datasets and thus cannot fully exploit available information. In this work, we propose generalizations of CWoLa and SALAD for settings where multiple reference datasets are available, building on weak supervision techniques. We demonstrate improved performance in a number of settings with realistic and synthetic data. As an added benefit, our generalizations enable us to provide finite-sample guarantees, improving on existing asymptotic analyses.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Computational Physics and Python Applications · Algorithms and Data Compression
