"Hi AirStar, Guide Me to the Badminton Court."
Ziqin Wang, Jinyu Chen, Xiangyi Zheng, Qinan Liao, Linjiang Huang, Si Liu

TL;DR
AirStar is an innovative UAV platform integrating large language models for natural interaction, environmental understanding, and versatile tasks like navigation, filming, and tracking, enabling intuitive and intelligent aerial assistance.
Contribution
This work introduces AirStar, a UAV system with LLM-based cognition, natural voice and gesture control, and multi-modal capabilities, advancing towards a general-purpose intelligent UAV agent.
Findings
Achieved accurate vision-and-language navigation (VLN) in UAVs.
Enabled natural voice and gesture interaction for UAV control.
Supported diverse functionalities like question answering and target tracking.
Abstract
Unmanned Aerial Vehicles, operating in environments with relatively few obstacles, offer high maneuverability and full three-dimensional mobility. This allows them to rapidly approach objects and perform a wide range of tasks often challenging for ground robots, making them ideal for exploration, inspection, aerial imaging, and everyday assistance. In this paper, we introduce AirStar, a UAV-centric embodied platform that turns a UAV into an intelligent aerial assistant: a large language model acts as the cognitive core for environmental understanding, contextual reasoning, and task planning. AirStar accepts natural interaction through voice commands and gestures, removing the need for a remote controller and significantly broadening its user base. It combines geospatial knowledge-driven long-distance navigation with contextual reasoning for fine-grained short-range control, resulting in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
