Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark
Xiaofeng Wang, Zheng Zhu, Yunpeng Zhang, Guan Huang, Yun Ye, Wenbo Xu,, Ziwei Chen, Xingang Wang

TL;DR
The paper introduces the ASAP benchmark to evaluate the real-time performance of vision-centric perception models in autonomous driving, emphasizing the importance of latency and computational constraints for practical deployment.
Contribution
It presents the first streaming perception benchmark for autonomous driving, including a high-frame-rate annotation pipeline and evaluation protocol under resource constraints.
Findings
Model performance varies with computational resources.
Latency and efficiency are critical for real-world deployment.
Baseline models show improved streaming detection across hardware.
Abstract
In recent years, vision-centric perception has flourished in various autonomous driving tasks, including 3D detection, semantic map construction, motion forecasting, and depth estimation. Nevertheless, the latency of vision-centric approaches is too high for practical deployment (e.g., most camera-based 3D detectors have a runtime greater than 300ms). To bridge the gap between ideal research and real-world applications, it is necessary to quantify the trade-off between performance and efficiency. Traditionally, autonomous-driving perception benchmarks perform the offline evaluation, neglecting the inference time delay. To mitigate the problem, we propose the Autonomous-driving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving. On the basis of the 2Hz annotated nuScenes dataset, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Visual Attention and Saliency Detection
