AgriVLN: Vision-and-Language Navigation for Agricultural Robots
Xiaobei Zhao, Xingqi Lyu, Xiang Li

TL;DR
This paper introduces AgriVLN, a vision-and-language navigation system tailored for agricultural robots, along with a new agricultural benchmark (A2A) to evaluate navigation performance in realistic farming environments.
Contribution
The paper presents the A2A benchmark and a novel AgriVLN baseline that leverages vision-language models for agricultural robot navigation, including an instruction decomposition module.
Findings
AgriVLN achieves state-of-the-art results on the A2A benchmark.
The Subtask List module improves success rate from 0.33 to 0.47.
AgriVLN performs well on short instructions but struggles with longer ones.
Abstract
Agricultural robots have emerged as powerful members in agricultural tasks, nevertheless, still heavily rely on manual operation or untransportable railway for movement, resulting in limited mobility and poor adaptability. Vision-and-Language Navigation (VLN) enables robots to navigate to the target destinations following natural language instructions, demonstrating strong performance on several domains. However, none of the existing benchmarks or methods is specifically designed for agricultural scenes. To bridge this gap, we propose Agriculture to Agriculture (A2A) benchmark, containing 1,560 episodes across six diverse agricultural scenes, in which all realistic RGB videos are captured by front-facing camera on a quadruped robot at a height of 0.38 meters, aligning with the practical deployment conditions. Meanwhile, we propose Vision-and-Language Navigation for Agricultural Robots…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
