Machine Learning for automatic identification of new minor species
Frederic Schmidt, Guillaume Cruz Mermy, Justin Erwin, Severine Robert,, Lori Neary, Ian R. Thomas, Frank Daerden, Bojan Ristic, Manish R. Patel,, Giancarlo Bellucci, Jose-Juan Lopez-Moreno, Ann-Carine Vandaele

TL;DR
This paper introduces an unsupervised machine learning method using non-negative matrix factorization to automatically identify new minor species in large spectroscopic datasets, reducing manual analysis and improving detection sensitivity.
Contribution
The authors develop a novel unsupervised approach for detecting minor chemical species in large spectral datasets, enabling automatic identification without prior labeling.
Findings
Successfully detects 100 hidden spectra among 10,000 in synthetic data.
Achieves detection limits of 100-500 ppt for methane in simulated NOMAD-SO spectra.
Confirms known molecules and discovers new spectral lines in real Martian data.
Abstract
One of the main difficulties to analyze modern spectroscopic datasets is due to the large amount of data. For example, in atmospheric transmittance spectroscopy, the solar occultation channel (SO) of the NOMAD instrument onboard the ESA ExoMars2016 satellite called Trace Gas Orbiter (TGO) had produced 10 millions of spectra in 20000 acquisition sequences since the beginning of the mission in April 2018 until 15 January 2020. Other datasets are even larger with billions of spectra for OMEGA onboard Mars Express or CRISM onboard Mars Reconnaissance Orbiter. Usually, new lines are discovered after a long iterative process of model fitting and manual residual analysis. Here we propose a new method based on unsupervised machine learning, to automatically detect new minor species. Although precise quantification is out of scope, this tool can also be used to quickly summarize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
