AirV2X: Unified Air-Ground Vehicle-to-Everything Collaboration
Xiangbo Gao, Yuheng Wu, Fengze Yang, Xuewen Luo, Keshu Wu, Xinghao Chen, Yuping Wang, Chenxi Liu, Yang Zhou, Zhengzhong Tu

TL;DR
AirV2X introduces a large-scale drone-assisted dataset for vehicle perception, enabling improved V2X collaboration through aerial data, which reduces occlusions and lowers infrastructure costs in autonomous driving.
Contribution
The paper presents AirV2X-Perception, a comprehensive drone-based dataset for V2X systems, filling a gap in aerial-assisted autonomous driving research.
Findings
Dataset includes 6.73 hours of diverse driving scenarios.
Enables development and evaluation of Vehicle-to-Drone algorithms.
Open-sourced development kits support research advancement.
Abstract
While multi-vehicular collaborative driving demonstrates clear advantages over single-vehicle autonomy, traditional infrastructure-based V2X systems remain constrained by substantial deployment costs and the creation of "uncovered danger zones" in rural and suburban areas. We present AirV2X-Perception, a large-scale dataset that leverages Unmanned Aerial Vehicles (UAVs) as a flexible alternative or complement to fixed Road-Side Units (RSUs). Drones offer unique advantages over ground-based perception: complementary bird's-eye-views that reduce occlusions, dynamic positioning capabilities that enable hovering, patrolling, and escorting navigation rules, and significantly lower deployment costs compared to fixed infrastructure. Our dataset comprises 6.73 hours of drone-assisted driving scenarios across urban, suburban, and rural environments with varied weather and lighting conditions.…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The dataset includes urban and rural distributions, lighting condition distributions, and weather distributions, and also contains three types of drone missions. Its comprehensive design makes it suitable for robust algorithm evaluation. - The paper provides a complete evaluation of this dataset for several cutting-edge algorithms, analyzes the performance of various algorithms under different conditions and the reasons behind their performance, and examines the trade-offs between performance
- **First, it is worth noting that the author included an arXiv link https://arxiv.org/abs/2506.19283 and a Hugging Face link https://huggingface.co/datasets/xiangbog/AirV2X-Perception/viewer?views%5B%5D=train in the anonymous link. This may violate the anonymity rules, and it need the chairs to decide.** - The paper mentions that RSUs, due to economic constraints, are only installed at high-traffic intersections and critical urban junctions, while drones, due to their low cost and high dynamic
1. The paper's primary contribution is the first large-scale dataset to systematically integrate vehicles, RSUs, and drones, which it clearly positions against prior work. The dataset features impressive scale, agent complexity, and environmental diversity. 2. The paper also provides an extensive and valuable benchmark of perception algorithms. The analysis of drone-specific navigation dynamics (hover, patrol, escort) is a novel and valuable feature. 3. The overall presentation is clear and well
Major Concerns: the sim2real gap. This manifests in two key areas: 1. The specified drone LiDAR configuration (Table 2) is unconventional. A 60-degree vertical FOV pointing exclusively downwards does not correspond to common, commercially available spinning LiDARs. If this setup is purely theoretical, it diminishes the dataset's utility for developing algorithms intended for real-world hardware. 2. While the three drone navigation modes (hover, patrol, escort) are conceptually clear (Section 3.
- `Novel V2X perspective:` Proposes vehicle-to-drone (V2D) collaboration as a cost-effective alternative to fixed RSUs. The aerial viewpoint offers wide FOV and line-of-sight advantages, which plausibly improves perception in occluded scenarios. - `Reproducible data pipeline:` Provides a clear CARLA+AirSim co-simulation procedure and 6.73 hours of data, enabling the community to study V2D and to recreate/extend the dataset with the described workflow. - `Empirical support:` 3D object detection
- `Necessity and task coverage:` The benefit of an aerial viewpoint is demonstrated primarily in 3D detection. It is unclear that this advantage transfers to other key perception tasks (online mapping, traffic light recognition, lane/topology extraction, HD map maintenance). Suggest broadening the evaluation to BEV segmentation and lane/centerline extraction. - `Lack of end-to-end, closed-loop evaluation:` The paper does not assess whether V2D measurably improves driving quality. Without closed
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Air Traffic Management and Optimization
