Toward Open Earth Science as Fast and Accessible as Natural Language
Marquita Ellis, Iksha Gurung, Muthukumaran Ramasubramanian, Rahul Ramachandran

TL;DR
This paper explores the feasibility of using Large Language Models for open earth science data analysis, focusing on accuracy, latency, cost, and maintainability, and introduces a software framework with initial results and future directions.
Contribution
It provides a foundational software framework, evaluation metrics, and initial results for applying LLMs to earth science, addressing technical and practical challenges.
Findings
Achieved near 100% accuracy on 10 of 11 metrics
Analyzed cost, latency, and maintainability of techniques
Identified opportunities for further research and development
Abstract
Is natural-language-driven earth observation data analysis now feasible with the assistance of Large Language Models (LLMs)? For open science in service of public interest, feasibility requires reliably high accuracy, interactive latencies, low (sustainable) costs, open LLMs, and openly maintainable software -- hence, the challenge. What are the techniques and programming system requirements necessary for satisfying these constraints, and what is the corresponding development and maintenance burden in practice? This study lays the groundwork for exploring these questions, introducing an impactful earth science use-case, and providing a software framework with evaluation data and metrics, along with initial results from employing model scaling, prompt-optimization, and inference-time scaling optimization techniques. While we attain high accuracy (near 100%) across 10 of 11 metrics, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research
