AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models
Jiarui Zhang, Junqi Hu, Zurong Mai, Yuhang Chen, Shuohong Lou, Henglian Huang, Lingyuan Zhao, Jianxi Huang, Yutong Lu, Haohuan Fu, Juepeng Zheng

TL;DR
AgroNVILA introduces a perception-reasoning decoupling framework with a large-scale multi-view dataset, significantly improving agricultural reasoning in multimodal large language models by addressing scale confusion and logic drift.
Contribution
The paper presents AgroNVILA, a novel multi-view agricultural multimodal large language model with a new dataset and architecture that enhances spatial understanding and reasoning in precision agriculture.
Findings
Achieved +15.18% improvement in multi-altitude agricultural reasoning.
Introduced AgroOmni, a large-scale multi-view dataset with 288K samples.
Demonstrated superior performance over state-of-the-art MLLMs in agricultural tasks.
Abstract
Agricultural multimodal reasoning requires robust spatial understanding across varying scales, from ground-level close-ups to top-down UAV and satellite imagery. Existing Multi-modal Large Language Models (MLLMs) suffer from a significant "terrestrial-centric" bias, causing scale confusion and logic drift during complex agricultural planning. To address this, we introduce the first large-scale AgroOmni (288K), a multi-view training corpus designed to capture diverse spatial topologies and scales in modern precision agriculture. Built on this dataset, we propose AgroNVILA, an MLLM that utilizes a novel Perception-Reasoning Decoupling (PRD) architecture. On the perception side, we incorporate a View-Conditioned Meta-Net (VCMN), which injects macroscopic spatial context into visual tokens, resolving scale ambiguities with minimal computational overhead. On the reasoning side,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Geographic Information Systems Studies · Smart Agriculture and AI
