RAG for Geoscience: What We Expect, Gaps and Opportunities
Runlong Yu, Shiyuan Luo, Rahul Ghosh, Lingyao Li, Yiqun Xie, Xiaowei Jia

TL;DR
This paper proposes Geo-RAG, an advanced retrieval-augmented generation framework tailored for geoscience, emphasizing multi-modal data retrieval, reasoning, verification, and transparency to address current limitations in text-centric workflows.
Contribution
It introduces Geo-RAG, a modular, multi-capability paradigm that enhances geoscience workflows through integrated retrieval, reasoning, generation, and verification processes.
Findings
Supports retrieval of multi-modal Earth data
Enables reasoning under physical constraints
Facilitates verification against models and measurements
Abstract
Retrieval-Augmented Generation (RAG) enhances language models by combining retrieval with generation. However, its current workflow remains largely text-centric, limiting its applicability in geoscience. Many geoscientific tasks are inherently evidence-hungry. Typical examples involve imputing missing observations using analog scenes, retrieving equations and parameters to calibrate models, geolocating field photos based on visual cues, or surfacing historical case studies to support policy analyses. A simple ``retrieve-then-generate'' pipeline is insufficient for these needs. We envision Geo-RAG, a next-generation paradigm that reimagines RAG as a modular retrieve reason generate verify loop. Geo-RAG supports four core capabilities: (i) retrieval of multi-modal Earth data; (ii) reasoning under physical and domain constraints; (iii) generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
