Multi-Modal Masked Autoencoders for Learning Image-Spectrum Associations for Galaxy Evolution and Cosmology

Morgan Himes; Samiksha Krishnamurthy; Andrew Lizarraga; Srinath Saikrishnan; Vikram Seenivasan; Jonathan Soriano; Ying Nian Wu; Tuan Do

arXiv:2510.22527·astro-ph.IM·October 28, 2025

Multi-Modal Masked Autoencoders for Learning Image-Spectrum Associations for Galaxy Evolution and Cosmology

Morgan Himes, Samiksha Krishnamurthy, Andrew Lizarraga, Srinath Saikrishnan, Vikram Seenivasan, Jonathan Soriano, Ying Nian Wu, Tuan Do

PDF

TL;DR

This paper introduces a multi-modal masked autoencoder that learns shared representations of galaxy images and spectra, enabling reconstruction and redshift prediction with promising results for large-scale astronomical surveys.

Contribution

It adapts a transformer-based masked autoencoder to jointly embed galaxy images and spectra, demonstrating its effectiveness in reconstruction and redshift estimation tasks.

Findings

01

Successfully reconstructs galaxy features from heavily masked data.

02

Performs redshift regression with accuracy comparable or superior to existing models.

03

Highlights limitations in fine detail reconstruction and spectral line strength recovery.

Abstract

Upcoming surveys will produce billions of galaxy images but comparatively few spectra, motivating models that learn cross-modal representations. We build a dataset of 134,533 galaxy images (HSC-PDR2) and spectra (DESI-DR1) and adapt a Multi-Modal Masked Autoencoder (MMAE) to embed both images and spectra in a shared representation. The MMAE is a transformer-based architecture, which we train by masking 75% of the data and reconstructing missing image and spectral tokens. We use this model to test three applications: spectral and image reconstruction from heavily masked data and redshift regression from images alone. It recovers key physical features, such as galaxy shapes, atomic emission line peaks, and broad continuum slopes, though it struggles with fine image details and line strengths. For redshift regression, the MMAE performs comparably or better than prior multi-modal models in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.