VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments
Yuze Wu, Mo Zhu, Xingxing Li, Yuheng Du, Yuxin Fan, Wenjun Li, Zhichao Han, Xin Zhou, Fei Gao

TL;DR
VLA-AN is a novel onboard framework that enhances drone navigation in complex environments by integrating vision, language, and action modules with safety and efficiency improvements, enabling real-time autonomous flight.
Contribution
The paper introduces a comprehensive VLA framework with a high-fidelity dataset, a progressive training scheme, safety-enhanced action modules, and optimized deployment pipeline for lightweight UAVs.
Findings
Achieves 98.1% success rate in navigation tasks.
8.3x inference throughput improvement on UAVs.
Significantly enhances spatial grounding and scene reasoning.
Abstract
This paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First, we construct a high-fidelity dataset utilizing 3D Gaussian Splatting (3D-GS) to effectively bridge the domain gap. Second, we introduce a progressive three-stage training framework that sequentially reinforces scene comprehension, core flight skills, and complex navigation capabilities. Third, we design a lightweight, real-time action module coupled with geometric safety correction. This module ensures fast, collision-free, and stable command generation, mitigating the safety risks inherent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
