GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics

Qianzhong Chen; Naixiang Gao; Suning Huang; JunEn Low; Timothy Chen; Jiankai Sun; Mac Schwager

arXiv:2506.14009·cs.RO·May 19, 2026

GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics

Qianzhong Chen, Naixiang Gao, Suning Huang, JunEn Low, Timothy Chen, Jiankai Sun, Mac Schwager

PDF

TL;DR

GRaD-Nav++ is a lightweight onboard drone navigation system that interprets natural language commands using a vision-language model, trained in simulation with differentiable reinforcement learning, and performs well in diverse environments.

Contribution

Introduces GRaD-Nav++, a novel lightweight VLA framework enabling real-time onboard language-guided drone navigation without external infrastructure.

Findings

01

Achieves 83% success on trained tasks and 75% on unseen tasks in simulation.

02

Attains 67% success on trained tasks and 50% on unseen tasks on real hardware.

03

Demonstrates strong generalization across multiple simulated and real-world environments.

Abstract

Autonomous drones capable of interpreting and executing high-level language instructions in unstructured environments remain a long-standing goal. Yet existing approaches are constrained by their dependence on hand-crafted skills, extensive parameter tuning, or computationally intensive models unsuitable for onboard use. We introduce GRaD-Nav++, a lightweight Vision-Language-Action (VLA) framework that runs fully onboard and follows natural-language commands in real time. Our policy is trained in a photorealistic 3D Gaussian Splatting (3DGS) simulator via Differentiable Reinforcement Learning (DiffRL), enabling efficient learning of low-level control from visual and linguistic inputs. At its core is a Mixture-of-Experts (MoE) action head, which adaptively routes computation to improve generalization while mitigating forgetting. In multi-task generalization experiments, GRaD-Nav++…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotic Path Planning Algorithms · Robotics and Sensor-Based Localization