QuadAgent: A Responsive Agent System for Vision-Language Guided Quadrotor Agile Flight
Ao Zhuang, Feng Yu, Tianbao Zhang, Linzuo Zhang, Danping Zou

TL;DR
QuadAgent is a novel vision-language guided quadrotor system that decouples reasoning from control, enabling agile, safe, and instruction-following flight in cluttered environments without training.
Contribution
It introduces a training-free, multi-agent architecture with scene memory and obstacle avoidance for responsive quadrotor navigation.
Findings
Outperforms baseline methods in efficiency and responsiveness in simulation.
Successfully navigates cluttered indoor spaces at speeds up to 5 m/s in real-world tests.
Maintains scene understanding with the lightweight Impression Graph.
Abstract
We present QuadAgent, a training-free agent system for agile quadrotor flight guided by vision-language inputs. Unlike prior end-to-end or serial agent approaches, QuadAgent decouples high-level reasoning from low-level control using an asynchronous multi-agent architecture: Foreground Workflow Agents handle active tasks and user commands, while Background Agents perform look-ahead reasoning. The system maintains scene memory via the Impression Graph, a lightweight topological map built from sparse keyframes, and ensures safe flight with a vision-based obstacle avoidance network. Simulation results show that QuadAgent outperforms baseline methods in efficiency and responsiveness. Real-world experiments demonstrate that it can interpret complex instructions, reason about its surroundings, and navigate cluttered indoor spaces at speeds up to 5 m/s.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
