"Hi AirStar, Guide Me to the Badminton Court."

Ziqin Wang; Jinyu Chen; Xiangyi Zheng; Qinan Liao; Linjiang Huang; Si Liu

arXiv:2507.04430·cs.RO·July 8, 2025

"Hi AirStar, Guide Me to the Badminton Court."

Ziqin Wang, Jinyu Chen, Xiangyi Zheng, Qinan Liao, Linjiang Huang, Si Liu

PDF

TL;DR

AirStar is an innovative UAV platform integrating large language models for natural interaction, environmental understanding, and versatile tasks like navigation, filming, and tracking, enabling intuitive and intelligent aerial assistance.

Contribution

This work introduces AirStar, a UAV system with LLM-based cognition, natural voice and gesture control, and multi-modal capabilities, advancing towards a general-purpose intelligent UAV agent.

Findings

01

Achieved accurate vision-and-language navigation (VLN) in UAVs.

02

Enabled natural voice and gesture interaction for UAV control.

03

Supported diverse functionalities like question answering and target tracking.

Abstract

Unmanned Aerial Vehicles, operating in environments with relatively few obstacles, offer high maneuverability and full three-dimensional mobility. This allows them to rapidly approach objects and perform a wide range of tasks often challenging for ground robots, making them ideal for exploration, inspection, aerial imaging, and everyday assistance. In this paper, we introduce AirStar, a UAV-centric embodied platform that turns a UAV into an intelligent aerial assistant: a large language model acts as the cognitive core for environmental understanding, contextual reasoning, and task planning. AirStar accepts natural interaction through voice commands and gestures, removing the need for a remote controller and significantly broadening its user base. It combines geospatial knowledge-driven long-distance navigation with contextual reasoning for fine-grained short-range control, resulting in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.