LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration
Gokce Inal, Pouyan Navard, Alper Yilmaz

TL;DR
LLaVA-LE introduces a specialized multimodal vision-language model for lunar exploration, leveraging a new large-scale lunar dataset and a two-stage training process to enhance lunar terrain understanding and reasoning capabilities.
Contribution
The paper presents LLaVA-LE, a novel lunar-specific vision-language model trained on the LUCID dataset, with a tailored training curriculum and evaluation benchmarks for planetary science applications.
Findings
LLaVA-LE outperforms baseline models with a 3.3x performance gain.
The model achieves a reasoning score of 1.070, surpassing reference scores.
Domain-specific data and instruction tuning significantly improve lunar terrain analysis.
Abstract
Recent advances in multimodal vision-language models (VLMs) have enabled joint reasoning over visual and textual information, yet their application to planetary science remains largely unexplored. A key hindrance is the absence of large-scale datasets that pair real planetary imagery with detailed scientific descriptions. In this work, we introduce LLaVA-LE (Large Language-and-Vision Assistant for Lunar Exploration), a vision-language model specialized for lunar surface and subsurface characterization. To enable this capability, we curate a new large-scale multimodal lunar dataset, LUCID (LUnar Caption Image Dataset) consisting of 96k high-resolution panchromatic images paired with detailed captions describing lunar terrain characteristics, and 81k question-answer (QA) pairs derived from approximately 20k images in the LUCID dataset. Leveraging this dataset, we fine-tune LLaVA using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Neural Network Applications
