SoraNav: Adaptive UAV Task-Centric Navigation via Zeroshot VLM Reasoning

Hongyu Song; Rishabh Dev Yadav; Cheng Guo; and Wei Pan

arXiv:2510.25191·cs.RO·March 5, 2026

SoraNav: Adaptive UAV Task-Centric Navigation via Zeroshot VLM Reasoning

Hongyu Song, Rishabh Dev Yadav, Cheng Guo, and Wei Pan

PDF

TL;DR

SoraNav enables UAVs to interpret natural language instructions for 3D navigation by integrating visual reasoning and adaptive decision strategies, significantly improving success rates and efficiency in complex environments.

Contribution

The paper introduces SoraNav, a novel framework that combines multi-modal visual annotation and adaptive decision making for zero-shot vision-language navigation of UAVs in 3D spaces.

Findings

01

Outperforms state-of-the-art baselines in success rate and efficiency.

02

Achieves 39.3% improvement in success rate in complex 3D scenarios.

03

Demonstrates robust real-world UAV navigation using natural language instructions.

Abstract

Autonomous navigation under natural language instructions represents a crucial step toward embodied intelligence, enabling complex task execution in environments ranging from industrial facilities to domestic spaces. However, language-driven 3D navigation for Unmanned Aerial Vehicles (UAVs) requires precise spatial reasoning, a capability inherently lacking in current zero-shot Vision-Language Models (VLMs) which often generate ambiguous outputs and cannot guarantee geometric feasibility. Furthermore, existing Vision-Language Navigation (VLN) methods are predominantly tailored for 2.5D ground robots, rendering them unable to generalize to the unconstrained 3D spatial reasoning required for aerial tasks in small-scale, cluttered environments. In this paper, we present SoraNav, a novel framework enabling zero-shot VLM reasoning for UAV task-centric navigation. To address the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.