LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration
Wen Jiang, Li Wang, Kangyao Huang, Wei Fan, Jinyuan Liu, Shaoyu Liu, Hongwei Duan, Bin Xu, and Xiangyang Ji

TL;DR
LongFly introduces a novel spatiotemporal context modeling framework for UAV vision-and-language navigation, effectively handling long-horizon, complex environments to improve semantic alignment and path planning accuracy.
Contribution
The paper presents a new framework with a slot-based historical image compression, trajectory encoding, and prompt-guided multimodal integration for UAV long-horizon navigation.
Findings
Outperforms state-of-the-art baselines by 7.89% in success rate.
Achieves 6.33% improvement in success weighted by path length.
Demonstrates robustness across seen and unseen environments.
Abstract
Unmanned aerial vehicles (UAVs) are crucial tools for post-disaster search and rescue, facing challenges such as high information density, rapid changes in viewpoint, and dynamic structures, especially in long-horizon navigation. However, current UAV vision-and-language navigation(VLN) methods struggle to model long-horizon spatiotemporal context in complex environments, resulting in inaccurate semantic alignment and unstable path planning. To this end, we propose LongFly, a spatiotemporal context modeling framework for long-horizon UAV VLN. LongFly proposes a history-aware spatiotemporal modeling strategy that transforms fragmented and redundant historical data into structured, compact, and expressive representations. First, we propose the slot-based historical image compression module, which dynamically distills multi-view historical observations into fixed-length contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Robotic Path Planning Algorithms
