Toward Open Earth Science as Fast and Accessible as Natural Language

Marquita Ellis; Iksha Gurung; Muthukumaran Ramasubramanian; Rahul Ramachandran

arXiv:2505.15690·cs.CE·September 15, 2025

Toward Open Earth Science as Fast and Accessible as Natural Language

Marquita Ellis, Iksha Gurung, Muthukumaran Ramasubramanian, Rahul Ramachandran

PDF

Open Access 1 Datasets

TL;DR

This paper explores the feasibility of using Large Language Models for open earth science data analysis, focusing on accuracy, latency, cost, and maintainability, and introduces a software framework with initial results and future directions.

Contribution

It provides a foundational software framework, evaluation metrics, and initial results for applying LLMs to earth science, addressing technical and practical challenges.

Findings

01

Achieved near 100% accuracy on 10 of 11 metrics

02

Analyzed cost, latency, and maintainability of techniques

03

Identified opportunities for further research and development

Abstract

Is natural-language-driven earth observation data analysis now feasible with the assistance of Large Language Models (LLMs)? For open science in service of public interest, feasibility requires reliably high accuracy, interactive latencies, low (sustainable) costs, open LLMs, and openly maintainable software -- hence, the challenge. What are the techniques and programming system requirements necessary for satisfying these constraints, and what is the corresponding development and maintenance burden in practice? This study lays the groundwork for exploring these questions, introducing an impactful earth science use-case, and providing a software framework with evaluation data and metrics, along with initial results from employing model scaling, prompt-optimization, and inference-time scaling optimization techniques. While we attain high accuracy (near 100%) across 10 of 11 metrics, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

nasa-impact/EO-via-NLP
dataset· 14 dl
14 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetics, Bioinformatics, and Biomedical Research