Augmenting representations with scientific papers
Nicol\`o Oreste Pinciroli Vago, Rocco Di Tella, Carolina Cuesta-L\'azaro, Michael J. Smith, Cecilia Garraffo, Rafael Mart\'inez-Galarza

TL;DR
This paper presents a contrastive learning framework that aligns astrophysical spectra with scientific literature, creating shared representations that enhance physical variable estimation and facilitate target identification.
Contribution
It introduces a novel contrastive pipeline for aligning spectral data with literature, improving physical variable estimation and enabling effective multimodal astrophysical analysis.
Findings
Achieved 20% Recall@1% in text retrieval from spectra
Improved physical variable estimation by 16-18% over unimodal baselines
Identified high-priority astrophysical targets through outlier analysis
Abstract
Astronomers have acquired vast repositories of multimodal data, including images, spectra, and time series, complemented by decades of literature that analyzes astrophysical sources. Still, these data sources are rarely systematically integrated. This work introduces a contrastive learning framework designed to align X-ray spectra with domain knowledge extracted from scientific literature, facilitating the development of shared multimodal representations. Establishing this connection is inherently complex, as scientific texts encompass a broader and more diverse physical context than spectra. We propose a contrastive pipeline that achieves a 20% Recall@1% when retrieving texts from spectra, proving that a meaningful alignment between these modalities is not only possible but capable of accelerating the interpretation of rare or poorly understood sources. Furthermore, the resulting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Machine Learning in Materials Science
