The Moon's Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction
Tom Sander, Moritz Tenthoff, Kay Wohlfarth, Christian W\"ohler

TL;DR
This paper introduces a unified transformer model for multimodal lunar surface reconstruction, enabling flexible translation between various data types like images, DEMs, and surface normals.
Contribution
It presents a novel single transformer architecture that learns shared representations across multiple lunar data modalities for the first time.
Findings
The model learns physically plausible relations across modalities.
It demonstrates effective lunar 3D reconstruction and albedo estimation.
Multimodal learning enhances planetary surface analysis.
Abstract
Multimodal learning is an emerging research topic across multiple disciplines but has rarely been applied to planetary science. In this contribution, we propose a single, unified transformer architecture trained to learn shared representations between multiple sources like grayscale images, Digital Elevation Models (DEMs), surface normals, and albedo maps. The architecture supports flexible translation from any input modality to any target modality. Our results demonstrate that our foundation model learns physically plausible relations across these four modalities. We further identify that image-based 3D reconstruction and albedo estimation (Shape and Albedo from Shading) of lunar images can be formulated as a multimodal learning problem. Our results demonstrate the potential of multimodal learning to solve Shape and Albedo from Shading and provide a new approach for large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
