A public dataset of Ariel simulated observations for developing exoplanetary atmosphere data reduction pipelines

Lorenzo V. Mugnai; Kai Hou Yip; Andrea Bocchieri; Andreas Papageorgiou; Virginie Batista; Orph\'ee Faucoz; Ang\`ele Syty; Tara Tahseen; Enzo Pascale; Ingo Waldmann

arXiv:2605.03719·astro-ph.EP·May 6, 2026

A public dataset of Ariel simulated observations for developing exoplanetary atmosphere data reduction pipelines

Lorenzo V. Mugnai, Kai Hou Yip, Andrea Bocchieri, Andreas Papageorgiou, Virginie Batista, Orph\'ee Faucoz, Ang\`ele Syty, Tara Tahseen, Enzo Pascale, Ingo Waldmann

PDF

TL;DR

This paper introduces a comprehensive public dataset based on Ariel mission simulations to benchmark exoplanet atmosphere data reduction methods, including a neural network baseline and analysis of ML limitations.

Contribution

It provides a new, extensive dataset for benchmarking and validating detrending algorithms in exoplanet spectroscopy, supporting community development.

Findings

01

Neural network baseline highlights ML detrending limitations.

02

Dataset demonstrates robustness and fidelity for Ariel mission simulations.

03

Identifies risks of dataset shift in ML-based detrending methods.

Abstract

Detecting and characterising exoplanet atmospheres remains challenging because atmospheric signals can be comparable to residual noise and instrumental/astrophysical systematics. Spectral features span from a few ppm for small planets up to $\sim 1 0^{3}$ ppm for warm/hot giants, while high-quality JWST time-series spectroscopy typically reaches $\sim 10$ -- $50$ ppm (occasionally $\sim 100$ -- $200$ ppm in the presence of stellar variability or stronger systematics), making correlated noise across temporal and spectral dimensions a key limitation. With JWST delivering an increasing volume of high-precision transmission spectra, and Ariel set to extend this to a homogeneous survey of $\sim 1 0^{3}$ exoplanet atmospheres, robust benchmarking resources with known ground truth are essential to develop and validate data-driven (including ML-based) detrending approaches. As a major step towards this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.