GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics
Qianzhong Chen, Naixiang Gao, Suning Huang, JunEn Low, Timothy Chen, Jiankai Sun, Mac Schwager

TL;DR
GRaD-Nav++ is a lightweight onboard drone navigation system that interprets natural language commands using a vision-language model, trained in simulation with differentiable reinforcement learning, and performs well in diverse environments.
Contribution
Introduces GRaD-Nav++, a novel lightweight VLA framework enabling real-time onboard language-guided drone navigation without external infrastructure.
Findings
Achieves 83% success on trained tasks and 75% on unseen tasks in simulation.
Attains 67% success on trained tasks and 50% on unseen tasks on real hardware.
Demonstrates strong generalization across multiple simulated and real-world environments.
Abstract
Autonomous drones capable of interpreting and executing high-level language instructions in unstructured environments remain a long-standing goal. Yet existing approaches are constrained by their dependence on hand-crafted skills, extensive parameter tuning, or computationally intensive models unsuitable for onboard use. We introduce GRaD-Nav++, a lightweight Vision-Language-Action (VLA) framework that runs fully onboard and follows natural-language commands in real time. Our policy is trained in a photorealistic 3D Gaussian Splatting (3DGS) simulator via Differentiable Reinforcement Learning (DiffRL), enabling efficient learning of low-level control from visual and linguistic inputs. At its core is a Mixture-of-Experts (MoE) action head, which adaptively routes computation to improve generalization while mitigating forgetting. In multi-task generalization experiments, GRaD-Nav++…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotic Path Planning Algorithms · Robotics and Sensor-Based Localization
