Searching for Carriers of the Diffuse Interstellar Bands Across Disciplines, using Natural Language Processing
Corentin van den Broek d'Obrenan, Fr\'ed\'eric Galliano, Jeremy, Minton, Viktor Botev, Ronin Wu

TL;DR
This paper demonstrates how NLP can be used to analyze large interdisciplinary scientific corpora to identify potential molecular carriers of Diffuse Interstellar Bands, advancing astrophysical research through machine learning techniques.
Contribution
The study introduces a novel NLP-based interdisciplinary approach to identify candidate molecules for DIBs, combining large-scale data analysis with astrophysical insights.
Findings
Identified several molecules with transitions matching DIB wavelengths.
Highlighted molecules containing chromophores as promising candidates.
Showed NLP's effectiveness in interdisciplinary scientific discovery.
Abstract
The explosion of scientific publications overloads researchers with information. This is even more dramatic for interdisciplinary studies, where several fields need to be explored. A tool to help researchers overcome this is Natural Language Processing (NLP): a machine-learning (ML) technique that allows scientists to automatically synthesize information from many articles. As a practical example, we have used NLP to conduct an interdisciplinary search for compounds that could be carriers for Diffuse Interstellar Bands (DIBs), a long-standing open question in astrophysics. We have trained a NLP model on a corpus of 1.5 million cross-domain articles in open access, and fine-tuned this model with a corpus of astrophysical publications about DIBs. Our analysis points us toward several molecules, studied primarily in biology, having transitions at the wavelengths of several DIBs and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSAS software applications and methods
