Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets
Guillermo Roque, Erika Maquiling, Jose Giovanni Tapia Lopez, Ross, Greer

TL;DR
This paper presents an automated system that uses GPS and NLP to generate instruction-action pairs for autonomous vehicle datasets, reducing manual effort and enabling large-scale data collection for vision-language navigation tasks.
Contribution
It introduces ADVLAT-Engine, a fully automated prototype system that collects and categorizes GPS-based voice instructions to create diverse IA datasets without human annotation.
Findings
Successfully collected and categorized GPS voice instructions into eight classes.
Demonstrated automated data collection from mobile GPS applications.
Showed potential for scalable, cost-effective IA dataset generation.
Abstract
Instruction-Action (IA) data pairs are valuable for training robotic systems, especially autonomous vehicles (AVs), but having humans manually annotate this data is costly and time-inefficient. This paper explores the potential of using mobile application Global Positioning System (GPS) references and Natural Language Processing (NLP) to automatically generate large volumes of IA commands and responses without having a human generate or retroactively tag the data. In our pilot data collection, by driving to various destinations and collecting voice instructions from GPS applications, we demonstrate a means to collect and categorize the diverse sets of instructions, further accompanied by video data to form complete vision-language-action triads. We provide details on our completely automated data collection prototype system, ADVLAT-Engine. We characterize collected GPS voice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Semantic Web and Ontologies
MethodsGreedy Policy Search · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
