AgriVLN: Vision-and-Language Navigation for Agricultural Robots

Xiaobei Zhao; Xingqi Lyu; Xiang Li

arXiv:2508.07406·cs.RO·August 12, 2025

AgriVLN: Vision-and-Language Navigation for Agricultural Robots

Xiaobei Zhao, Xingqi Lyu, Xiang Li

PDF

TL;DR

This paper introduces AgriVLN, a vision-and-language navigation system tailored for agricultural robots, along with a new agricultural benchmark (A2A) to evaluate navigation performance in realistic farming environments.

Contribution

The paper presents the A2A benchmark and a novel AgriVLN baseline that leverages vision-language models for agricultural robot navigation, including an instruction decomposition module.

Findings

01

AgriVLN achieves state-of-the-art results on the A2A benchmark.

02

The Subtask List module improves success rate from 0.33 to 0.47.

03

AgriVLN performs well on short instructions but struggles with longer ones.

Abstract

Agricultural robots have emerged as powerful members in agricultural tasks, nevertheless, still heavily rely on manual operation or untransportable railway for movement, resulting in limited mobility and poor adaptability. Vision-and-Language Navigation (VLN) enables robots to navigate to the target destinations following natural language instructions, demonstrating strong performance on several domains. However, none of the existing benchmarks or methods is specifically designed for agricultural scenes. To bridge this gap, we propose Agriculture to Agriculture (A2A) benchmark, containing 1,560 episodes across six diverse agricultural scenes, in which all realistic RGB videos are captured by front-facing camera on a quadruped robot at a height of 0.38 meters, aligning with the practical deployment conditions. Meanwhile, we propose Vision-and-Language Navigation for Agricultural Robots…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.