MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation
Xiaobei Zhao, Xingqi Lyu, Xin Chen, Xiang Li

TL;DR
This paper introduces MDE-AgriVLN, a novel monocular depth estimation approach for agricultural vision-and-language navigation, significantly improving robot navigation success and accuracy in agricultural environments.
Contribution
It presents a monocular depth estimation module that enhances vision-and-language navigation for agricultural robots, pioneering the application in this domain.
Findings
Success Rate increased from 0.23 to 0.32
Navigation Error decreased from 4.43m to 4.08m
Achieved state-of-the-art performance on A2A benchmark
Abstract
Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extended Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. Unlike human binocular vision, most agricultural robots are only given a single camera for monocular vision, which results in limited spatial perception. To bridge this gap, we present the method of Agricultural Vision-and-Language Navigation with Monocular Depth Estimation (MDE-AgriVLN), in which we propose the MDE module generating depth features from RGB images, to assist the decision-maker on multimodal reasoning. When evaluated on the A2A benchmark, our MDE-AgriVLN method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Agriculture and AI · Robot Manipulation and Learning · Multimodal Machine Learning Applications
