MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation

Xiaobei Zhao; Xingqi Lyu; Xin Chen; Xiang Li

arXiv:2512.03958·cs.RO·January 5, 2026

MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation

Xiaobei Zhao, Xingqi Lyu, Xin Chen, Xiang Li

PDF

Open Access

TL;DR

This paper introduces MDE-AgriVLN, a novel monocular depth estimation approach for agricultural vision-and-language navigation, significantly improving robot navigation success and accuracy in agricultural environments.

Contribution

It presents a monocular depth estimation module that enhances vision-and-language navigation for agricultural robots, pioneering the application in this domain.

Findings

01

Success Rate increased from 0.23 to 0.32

02

Navigation Error decreased from 4.43m to 4.08m

03

Achieved state-of-the-art performance on A2A benchmark

Abstract

Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extended Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. Unlike human binocular vision, most agricultural robots are only given a single camera for monocular vision, which results in limited spatial perception. To bridge this gap, we present the method of Agricultural Vision-and-Language Navigation with Monocular Depth Estimation (MDE-AgriVLN), in which we propose the MDE module generating depth features from RGB images, to assist the decision-maker on multimodal reasoning. When evaluated on the A2A benchmark, our MDE-AgriVLN method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Agriculture and AI · Robot Manipulation and Learning · Multimodal Machine Learning Applications