Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation

Zhenxing Xu; Brikit Lu; Weidong Bao; Zhengqiu Zhu; Junsong Zhang; Hui Yan; Wenhao Lu; Ji Wang

arXiv:2602.15875·cs.RO·February 19, 2026

Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation

Zhenxing Xu, Brikit Lu, Weidong Bao, Zhengqiu Zhu, Junsong Zhang, Hui Yan, Wenhao Lu, Ji Wang

PDF

Open Access

TL;DR

Fly0 introduces a novel framework that separates semantic understanding from geometric planning in aerial navigation, enhancing robustness, reducing latency, and improving success rates in complex environments.

Contribution

The paper presents Fly0, a three-stage decoupled system that integrates multimodal language reasoning with geometric planning for zero-shot aerial navigation.

Findings

01

Outperforms state-of-the-art baselines in success rate by over 20%

02

Reduces navigation error by approximately 50%

03

Operates efficiently without continuous inference

Abstract

Current Visual-Language Navigation (VLN) methodologies face a trade-off between semantic understanding and control precision. While Multimodal Large Language Models (MLLMs) offer superior reasoning, deploying them as low-level controllers leads to high latency, trajectory oscillations, and poor generalization due to weak geometric grounding. To address these limitations, we propose Fly0, a framework that decouples semantic reasoning from geometric planning. The proposed method operates through a three-stage pipeline: (1) an MLLM-driven module for grounding natural language instructions into 2D pixel coordinates; (2) a geometric projection module that utilizes depth data to localize targets in 3D space; and (3) a geometric planner that generates collision-free trajectories. This mechanism enables robust navigation even when visual contact is lost. By eliminating the need for continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Robot Manipulation and Learning