TL;DR
AutoSciDACT is a unified pipeline that combines contrastive embedding and hypothesis testing to detect and statistically validate novel phenomena in complex scientific datasets, addressing noise and high dimensionality challenges.
Contribution
It introduces AutoSciDACT, a novel framework integrating contrastive pre-training and two-sample testing for robust scientific novelty detection.
Findings
High sensitivity to small anomalies across diverse datasets
Effective statistical quantification of deviations
Applicable to multiple scientific domains
Abstract
Novelty detection in large scientific datasets faces two key challenges: the noisy and high-dimensional nature of experimental data, and the necessity of making statistically robust statements about any observed outliers. While there is a wealth of literature on anomaly detection via dimensionality reduction, most methods do not produce outputs compatible with quantifiable claims of scientific discovery. In this work we directly address these challenges, presenting the first step towards a unified pipeline for novelty detection adapted for the rigorous statistical demands of science. We introduce AutoSciDACT (Automated Scientific Discovery with Anomalous Contrastive Testing), a general-purpose pipeline for detecting novelty in scientific data. AutoSciDACT begins by creating expressive low-dimensional data representations using a contrastive pre-training, leveraging the abundance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
