HALO: High-Altitude Language-Conditioned Monocular Aerial Exploration and Navigation
Yuezhan Tao, Dexter Ong, Fernando Cladera, Jason Hughes, Camillo J. Taylor, Pratik Chaudhari, Vijay Kumar

TL;DR
HALO enables real-time high-altitude aerial mapping and exploration using monocular vision, GPS, and IMU, allowing autonomous navigation and task completion in large outdoor environments with improved efficiency and semantic understanding.
Contribution
This work introduces HALO, a system for real-time dense 3D reconstruction and semantic mapping from monocular aerial imagery, capable of planning natural language-guided exploration paths.
Findings
HALO outperforms state-of-the-art in exploration efficiency by up to 68%.
Successfully completes large-scale missions up to 24,600 sq. m. at 40 m altitude.
All modules operate onboard a custom quadrotor in real-world tests.
Abstract
We demonstrate real-time high-altitude aerial metric-semantic mapping and exploration using a monocular camera paired with a global positioning system (GPS) and an inertial measurement unit (IMU). Our system, named HALO, addresses two key challenges: (i) real-time dense 3D reconstruction using vision at large distances, and (ii) mapping and exploration of large-scale outdoor environments with accurate scene geometry and semantics. We demonstrate that HALO can plan informative paths that exploit this information to complete missions with multiple tasks specified in natural language. In simulation-based evaluation across large-scale environments of size up to 78,000 sq. m., HALO consistently completes tasks with less exploration time and achieves up to 68% higher competitive ratio in terms of the distance traveled compared to the state-of-the-art semantic exploration baseline. We use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
