CosFly: Plan in the Matrix, Fly in the World
Hanxuan Chen, Xiangyue Wang, Songsheng Cheng, Ruilong Ren, Jie Zheng, Shuai Yuan, Tianle Zeng, Hanzhong Guo, Binbo Li, Kangli Wang, and Ji Pei

TL;DR
CosFly introduces a comprehensive planning and simulation pipeline for UAV aerial tracking, along with a large-scale dataset, enabling advanced research in dynamic target tracking and UAV navigation across diverse environments.
Contribution
The paper presents a modular simulation pipeline and a large UAV dataset, supporting multi-modal perception and trajectory planning for aerial tracking in complex environments.
Findings
Supports configurable camera FOV for focal length simulation.
Provides two trajectory-planning paradigms: two-stage and gradient-based.
Contains 250 validated trajectories and 100,000 images with drone pose annotations.
Abstract
We present CosFly, a box-structured planning and multimodal simulation pipeline for aerial tracking, together with CosFly-Track, a large-scale UAV dataset for dynamic target tracking across diverse environments including urban centers, highways, rural landscapes, forests, and coastal towns. In our current implementation on CARLA, CosFly provides a modular 7-step construction pipeline that converts complex 3D worlds into structured obstacle representations for planning, then projects the resulting trajectories back into multi-modal sensor data -- including RGB images, high-precision depth maps, and semantic segmentation masks -- paired with natural language navigation instructions. A key feature is the support for configurable fixed-FOV zoom levels (one FOV setting drawn per trajectory and held constant throughout), enabling simulation of various focal lengths through camera-intrinsic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
