TIDMAD: Time Series Dataset for Discovering Dark Matter with AI Denoising

J. T. Fry; Xinyi Hope Fu; Zhenghao Fu; Kaliroe M. W. Pappas; Lindley Winslow; Aobo Li

arXiv:2406.04378·cs.LG·October 29, 2025

TIDMAD: Time Series Dataset for Discovering Dark Matter with AI Denoising

J. T. Fry, Xinyi Hope Fu, Zhenghao Fu, Kaliroe M. W. Pappas, Lindley Winslow, Aobo Li

PDF

Open Access 1 Repo 4 Reviews

TL;DR

TIDMAD provides a comprehensive dataset, denoising tools, and analysis framework from the ABRACADABRA experiment to aid AI-driven dark matter detection using ultra-long time-series data.

Contribution

It introduces a new dataset, denoising score, and analysis framework specifically designed for AI-based dark matter searches in time-series data.

Findings

01

Dataset enables AI models to detect dark matter signals.

02

Denoising score improves signal extraction accuracy.

03

Framework standardizes dark matter search results.

Abstract

Dark matter makes up approximately 85% of total matter in our universe, yet it has never been directly observed in any laboratory on Earth. The origin of dark matter is one of the most important questions in contemporary physics, and a convincing detection of dark matter would be a Nobel-Prize-level breakthrough in fundamental science. The ABRACADABRA experiment was specifically designed to search for dark matter. Although it has not yet made a discovery, ABRACADABRA has produced several dark matter search results widely endorsed by the physics community. The experiment generates ultra-long time-series data at a rate of 10 million samples per second, where the dark matter signal would manifest itself as a sinusoidal oscillation mode within the ultra-long time series. In this paper, we present the TIDMAD -- a comprehensive data release from the ABRACADABRA experiment including three key…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

- Interesting physics problem that I have not seen explored at all in ML venues. - The evaluation criteria is interesting and I also don't think this specific metric has been explored much. - Potentially an interesting problem for a benchmark.

Weaknesses

- This paper is fundamentally a dark matter instrumentation paper that was submitted to an ML venue. It is not appropriate for consideration in this conference. The authors should work with someone in the field of machine learning, or even someone in the physics and machine learning community, so they learn how to frame contributions so that they would be useful for the broader ICLR audience. Even as a physicist myself, I found the paper difficult to understand or to see how to repurpose the dat

Reviewer 02Rating 6Confidence 4

Strengths

- Originality: The creation of a dark matter detection dataset and benchmarking framework is novel and addresses a significant challenge in experimental physics. - Quality: The paper details the dataset structure, model training, and evaluation metrics with supporting code, demonstrating transparency and reproducibility. - Significance: The dataset can benefit the broader scientific community by enabling more effective signal extraction in various time series-based applications. - Clarity: The m

Weaknesses

- Baseline Comparisons: The choice of traditional denoising algorithms (e.g., moving average, Savitzky-Golay filter) seems weak, as these performed poorly compared to no processing. Incorporating more sophisticated traditional baselines might strengthen the results. - Dataset Limitations: The size of the dataset is only 1% of the previous ABRA Run 3 dataset, which may limit the generalizability of conclusions. - Model Generalizability: The need to train separate models for different frequency ra

Reviewer 03Rating 3Confidence 4

Strengths

Building community datasets and benchmarks in scientific domains is a meaningful effort. The newly launched ABRA experiment's initiative to establish shared resources is encouraging, promoting collaboration, transparency, and progress.

Weaknesses

This paper offers little machine learning novelty. The dataset release is better suited for a dataset track or domain-specific venue, rather than ICLR. Several technical details require further clarification, revision, or investigation (see Questions).

Reviewer 04Rating 3Confidence 4

Strengths

The paper is clearly written, technically correct, and seems reproducible given that one has access to the hardware. The approach is also well-motivated. It is based on the ABRACADABRA experiment, and creates a smaller and possibly more accessible dataset for the community to easily access and run experiments on. The authors create a training set that has ground truth via injecting a sinusoidal signal that spans two orders of magnitude in frequency to represent dark matter signal, and add Gaussi

Weaknesses

The paper does not present novel findings theoretically or algorithmically, but presents a dataset that could potentially bring impactful findings in the future. Historically, benchmark datasets have been the foundation of advancement in the ML community in the recent decades (e.g. Imagenet). This work presents a dataset which could potentially have a large impact, however, the paper would be a better fit for audiences other than the ICLR community. The authors use several architectures and alg

Code & Models

Repositories

jessicafry/tidmad
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications · Dark Matter and Cosmic Phenomena · Functional Brain Connectivity Studies