A public dataset of Ariel simulated observations for developing exoplanetary atmosphere data reduction pipelines
Lorenzo V. Mugnai, Kai Hou Yip, Andrea Bocchieri, Andreas Papageorgiou, Virginie Batista, Orph\'ee Faucoz, Ang\`ele Syty, Tara Tahseen, Enzo Pascale, Ingo Waldmann

TL;DR
This paper introduces a comprehensive public dataset based on Ariel mission simulations to benchmark exoplanet atmosphere data reduction methods, including a neural network baseline and analysis of ML limitations.
Contribution
It provides a new, extensive dataset for benchmarking and validating detrending algorithms in exoplanet spectroscopy, supporting community development.
Findings
Neural network baseline highlights ML detrending limitations.
Dataset demonstrates robustness and fidelity for Ariel mission simulations.
Identifies risks of dataset shift in ML-based detrending methods.
Abstract
Detecting and characterising exoplanet atmospheres remains challenging because atmospheric signals can be comparable to residual noise and instrumental/astrophysical systematics. Spectral features span from a few ppm for small planets up to ppm for warm/hot giants, while high-quality JWST time-series spectroscopy typically reaches -- ppm (occasionally -- ppm in the presence of stellar variability or stronger systematics), making correlated noise across temporal and spectral dimensions a key limitation. With JWST delivering an increasing volume of high-precision transmission spectra, and Ariel set to extend this to a homogeneous survey of exoplanet atmospheres, robust benchmarking resources with known ground truth are essential to develop and validate data-driven (including ML-based) detrending approaches. As a major step towards this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
