VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments

Yuze Wu; Mo Zhu; Xingxing Li; Yuheng Du; Yuxin Fan; Wenjun Li; Zhichao Han; Xin Zhou; Fei Gao

arXiv:2512.15258·cs.RO·December 22, 2025

VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments

Yuze Wu, Mo Zhu, Xingxing Li, Yuheng Du, Yuxin Fan, Wenjun Li, Zhichao Han, Xin Zhou, Fei Gao

PDF

Open Access

TL;DR

VLA-AN is a novel onboard framework that enhances drone navigation in complex environments by integrating vision, language, and action modules with safety and efficiency improvements, enabling real-time autonomous flight.

Contribution

The paper introduces a comprehensive VLA framework with a high-fidelity dataset, a progressive training scheme, safety-enhanced action modules, and optimized deployment pipeline for lightweight UAVs.

Findings

01

Achieves 98.1% success rate in navigation tasks.

02

8.3x inference throughput improvement on UAVs.

03

Significantly enhances spatial grounding and scene reasoning.

Abstract

This paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First, we construct a high-fidelity dataset utilizing 3D Gaussian Splatting (3D-GS) to effectively bridge the domain gap. Second, we introduce a progressive three-stage training framework that sequentially reinforces scene comprehension, core flight skills, and complex navigation capabilities. Third, we design a lightweight, real-time action module coupled with geometric safety correction. This module ensures fast, collision-free, and stable command generation, mitigating the safety risks inherent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications