Towards Unified Vision Language Models for Forest Ecological Analysis in Earth Observation

Xizhe Xue; Xiao Xiang Zhu

arXiv:2511.16853·cs.CV·November 24, 2025

Towards Unified Vision Language Models for Forest Ecological Analysis in Earth Observation

Xizhe Xue, Xiao Xiang Zhu

PDF

Open Access 1 Video

TL;DR

This paper introduces REO-Instruct, a novel benchmark dataset for evaluating vision language models on both descriptive and regression tasks in Earth Observation, specifically focusing on forest ecological analysis.

Contribution

It presents the first unified benchmark for EO that combines qualitative understanding with quantitative biophysical variable prediction, bridging perception and scientific inference.

Findings

01

Current VLMs struggle with numeric reasoning in EO tasks.

02

REO-Instruct provides a standardized platform for developing geospatial models.

03

Baseline evaluations reveal significant challenges in scientific regression tasks.

Abstract

Recent progress in vision language models (VLMs) has enabled remarkable perception and reasoning capabilities, yet their potential for scientific regression in Earth Observation (EO) remains largely unexplored. Existing EO datasets mainly emphasize semantic understanding tasks such as captioning or classification, lacking benchmarks that align multimodal perception with measurable biophysical variables. To fill this gap, we present REO-Instruct, the first unified benchmark designed for both descriptive and regression tasks in EO. REO-Instruct establishes a cognitively interpretable logic chain in forest ecological scenario (human activity,land-cover classification, ecological patch counting, above-ground biomass (AGB) regression), bridging qualitative understanding and quantitative prediction. The dataset integrates co-registered Sentinel-2 and ALOS-2 imagery with structured textual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards Unified Vision Language Models for Forest Ecological Analysis in Earth Observation· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Remote-Sensing Image Classification · Geographic Information Systems Studies