EO-Gym: A Multimodal, Interactive Environment for Earth Observation Agents

Sai Ma; Zhuang Li; Sichao Li; Xinyue Xu; Ruibiao Zhu; Tony Boston; John A. Taylor

arXiv:2605.01250·cs.AI·May 5, 2026

EO-Gym: A Multimodal, Interactive Environment for Earth Observation Agents

Sai Ma, Zhuang Li, Sichao Li, Xinyue Xu, Ruibiao Zhu, Tony Boston, John A. Taylor

PDF

TL;DR

EO-Gym introduces a comprehensive, interactive environment for Earth Observation analysis, enabling multimodal, tool-using agents to perform complex reasoning across spatial, temporal, and sensor modalities.

Contribution

It provides a novel controlled framework and benchmark for interactive EO reasoning, addressing limitations of existing fixed-input, single-turn tasks.

Findings

01

Strong models struggle with interactive EO reasoning, especially across time and modalities.

02

Fine-tuned EO-Gym-4B improves Pass@3 from 0.49 to 0.74.

03

EO-Gym offers a reproducible environment for developing and evaluating EO agents.

Abstract

Earth Observation (EO) analysis is inherently interactive: resolving uncertainty often requires expanding the region of interest, retrieving historical observations, and switching across sensors such as optical and Synthetic Aperture Radar. However, most EO benchmarks collapse this process into fixed-input, single-turn tasks. To address this gap, we present EO-Gym, a controlled executable framework for multimodal, tool-using EO agents that formulates EO analysis as a Gymnasium-style local geospatial workspace backed by more than 660k multimodal files indexed by location, time, and sensor type, with 35 EO-specialized tools spanning six task families. Built on this environment, we construct EO-Gym-Data, a benchmark of 9,078 trajectories and 34,604 reasoning steps, and grounded in eight public EO datasets together with Landsat and Sentinel-2 imagery. Evaluating $10$ open and closed VLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.