QuadAgent: A Responsive Agent System for Vision-Language Guided Quadrotor Agile Flight

Ao Zhuang; Feng Yu; Tianbao Zhang; Linzuo Zhang; Danping Zou

arXiv:2604.02786·cs.RO·April 6, 2026

QuadAgent: A Responsive Agent System for Vision-Language Guided Quadrotor Agile Flight

Ao Zhuang, Feng Yu, Tianbao Zhang, Linzuo Zhang, Danping Zou

PDF

TL;DR

QuadAgent is a novel vision-language guided quadrotor system that decouples reasoning from control, enabling agile, safe, and instruction-following flight in cluttered environments without training.

Contribution

It introduces a training-free, multi-agent architecture with scene memory and obstacle avoidance for responsive quadrotor navigation.

Findings

01

Outperforms baseline methods in efficiency and responsiveness in simulation.

02

Successfully navigates cluttered indoor spaces at speeds up to 5 m/s in real-world tests.

03

Maintains scene understanding with the lightweight Impression Graph.

Abstract

We present QuadAgent, a training-free agent system for agile quadrotor flight guided by vision-language inputs. Unlike prior end-to-end or serial agent approaches, QuadAgent decouples high-level reasoning from low-level control using an asynchronous multi-agent architecture: Foreground Workflow Agents handle active tasks and user commands, while Background Agents perform look-ahead reasoning. The system maintains scene memory via the Impression Graph, a lightweight topological map built from sparse keyframes, and ensures safe flight with a vision-based obstacle avoidance network. Simulation results show that QuadAgent outperforms baseline methods in efficiency and responsiveness. Real-world experiments demonstrate that it can interpret complex instructions, reason about its surroundings, and navigate cluttered indoor spaces at speeds up to 5 m/s.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.