AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing

Samuel Bright-Thonney; Christina Reissel; Gaia Grosso; Nathaniel Woodward; Katya Govorkova; Andrzej Novak; Sang Eon Park; Eric Moreno; Philip Harris

arXiv:2510.21935·cs.LG·January 26, 2026

AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing

Samuel Bright-Thonney, Christina Reissel, Gaia Grosso, Nathaniel Woodward, Katya Govorkova, Andrzej Novak, Sang Eon Park, Eric Moreno, Philip Harris

PDF

1 Video

TL;DR

AutoSciDACT is a unified pipeline that combines contrastive embedding and hypothesis testing to detect and statistically validate novel phenomena in complex scientific datasets, addressing noise and high dimensionality challenges.

Contribution

It introduces AutoSciDACT, a novel framework integrating contrastive pre-training and two-sample testing for robust scientific novelty detection.

Findings

01

High sensitivity to small anomalies across diverse datasets

02

Effective statistical quantification of deviations

03

Applicable to multiple scientific domains

Abstract

Novelty detection in large scientific datasets faces two key challenges: the noisy and high-dimensional nature of experimental data, and the necessity of making statistically robust statements about any observed outliers. While there is a wealth of literature on anomaly detection via dimensionality reduction, most methods do not produce outputs compatible with quantifiable claims of scientific discovery. In this work we directly address these challenges, presenting the first step towards a unified pipeline for novelty detection adapted for the rigorous statistical demands of science. We introduce AutoSciDACT (Automated Scientific Discovery with Anomalous Contrastive Testing), a general-purpose pipeline for detecting novelty in scientific data. AutoSciDACT begins by creating expressive low-dimensional data representations using a contrastive pre-training, leveraging the abundance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing· slideslive