APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation
Daoxuan Zhang, Ping Chen, Xiaobo Xia, Xiu Su, Ruichen Zhen, Jianqiang Xiao, Shuo Yang

TL;DR
APEX is a hierarchical, asynchronous UAV exploration agent that uses a vision-language model for dynamic mapping and reinforcement learning for action decision, significantly improving aerial object goal navigation performance.
Contribution
The paper introduces APEX, a novel hierarchical framework combining dynamic semantic mapping, reinforcement learning, and open-vocabulary target grounding for aerial navigation.
Findings
Outperforms previous methods by +4.2% SR and +2.8% SPL on UAV-ON benchmarks.
Effectively integrates VLM-based mapping with reinforcement learning for robust control.
Demonstrates superior efficiency and generalization in complex aerial environments.
Abstract
Aerial Object Goal Navigation, a challenging frontier in Embodied AI, requires an Unmanned Aerial Vehicle (UAV) agent to autonomously explore, reason, and identify a specific target using only visual perception and language description. However, existing methods struggle with the memorization of complex spatial representations in aerial environments, reliable and interpretable action decision-making, and inefficient exploration and information gathering. To address these challenges, we introduce \textbf{APEX} (Aerial Parallel Explorer), a novel hierarchical agent designed for efficient exploration and target acquisition in complex aerial settings. APEX is built upon a modular, three-part architecture: 1) Dynamic Spatio-Semantic Mapping Memory, which leverages the zero-shot capability of a Vision-Language Model (VLM) to dynamically construct high-resolution 3D Attraction, Exploration,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Robotic Path Planning Algorithms
