Automated Data Curation Using GPS & NLP to Generate Instruction-Action   Pairs for Autonomous Vehicle Vision-Language Navigation Datasets

Guillermo Roque; Erika Maquiling; Jose Giovanni Tapia Lopez; Ross; Greer

arXiv:2505.03174·cs.RO·May 7, 2025

Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets

Guillermo Roque, Erika Maquiling, Jose Giovanni Tapia Lopez, Ross, Greer

PDF

Open Access

TL;DR

This paper presents an automated system that uses GPS and NLP to generate instruction-action pairs for autonomous vehicle datasets, reducing manual effort and enabling large-scale data collection for vision-language navigation tasks.

Contribution

It introduces ADVLAT-Engine, a fully automated prototype system that collects and categorizes GPS-based voice instructions to create diverse IA datasets without human annotation.

Findings

01

Successfully collected and categorized GPS voice instructions into eight classes.

02

Demonstrated automated data collection from mobile GPS applications.

03

Showed potential for scalable, cost-effective IA dataset generation.

Abstract

Instruction-Action (IA) data pairs are valuable for training robotic systems, especially autonomous vehicles (AVs), but having humans manually annotate this data is costly and time-inefficient. This paper explores the potential of using mobile application Global Positioning System (GPS) references and Natural Language Processing (NLP) to automatically generate large volumes of IA commands and responses without having a human generate or retroactively tag the data. In our pilot data collection, by driving to various destinations and collecting voice instructions from GPS applications, we demonstrate a means to collect and categorize the diverse sets of instructions, further accompanied by video data to form complete vision-language-action triads. We provide details on our completely automated data collection prototype system, ADVLAT-Engine. We characterize collected GPS voice…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Semantic Web and Ontologies

MethodsGreedy Policy Search · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings