TL;DR
Mars-Bench is the first comprehensive benchmark for evaluating foundation models on Mars science tasks, enabling systematic assessment across diverse datasets and fostering development of domain-specific models.
Contribution
The paper introduces Mars-Bench, a standardized benchmark with datasets and evaluation protocols for Mars-related tasks, addressing a critical gap in Mars science AI research.
Findings
Mars-specific models outperform general models on benchmark tasks.
Pre-trained models on Earth and natural images show limited transferability.
Domain-adapted pre-training improves Mars model performance.
Abstract
Foundation models have enabled rapid progress across many specialized domains by leveraging large-scale pre-training on unlabeled data, demonstrating strong generalization to a variety of downstream tasks. While such models have gained significant attention in fields like Earth Observation, their application to Mars science remains limited. A key enabler of progress in other domains has been the availability of standardized benchmarks that support systematic evaluation. In contrast, Mars science lacks such benchmarks and standardized evaluation frameworks, which have limited progress toward developing foundation models for Martian tasks. To address this gap, we introduce Mars-Bench, the first benchmark designed to systematically evaluate models across a broad range of Mars-related tasks using both orbital and surface imagery. Mars-Bench comprises 20 datasets spanning classification,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
