ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation

Hosam Elgendy; Ahmed Sharshar; Ahmed Aboeitta; and Mohsen Guizani

arXiv:2508.10635·cs.CV·April 20, 2026

ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation

Hosam Elgendy, Ahmed Sharshar, Ahmed Aboeitta, and Mohsen Guizani

PDF

TL;DR

ChatENV is an interactive vision-language model that jointly reasons over satellite images and sensor data for environmental monitoring, enabling scenario simulation and outperforming existing models.

Contribution

It introduces the first interactive VLM that integrates satellite imagery and sensor data, with a large dataset and fine-tuning techniques for environmental reasoning.

Findings

01

Achieves high temporal reasoning accuracy (BERTF1 0.902).

02

Supports interactive scenario-based environmental analysis.

03

Rivals or surpasses state-of-the-art temporal models.

Abstract

Understanding environmental changes from remote sensing imagery is vital for climate resilience, urban planning, and ecosystem monitoring. Yet, current vision language models (VLMs) overlook causal signals from environmental sensors, rely on single-source captions prone to stylistic bias, and lack interactive scenario-based reasoning. We present ChatENV, the first interactive VLM that jointly reasons over satellite image pairs and real-world sensor data. Our framework: (i) creates a 177k-image dataset forming 152k temporal pairs across 62 land-use classes in 197 countries with rich sensor metadata (e.g., temperature, PM10, CO); (ii) annotates data using GPT4o and Gemini 2.0 for stylistic and semantic diversity; and (iii) fine-tunes Qwen-2.5-VL using efficient Low-Rank Adaptation (LoRA) adapters for chat purposes. ChatENV achieves strong performance in temporal and "what-if" reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.