TL;DR
This paper introduces a comprehensive visual perception benchmark suite with over 250K annotated video frames from diverse real-world scenarios, enabling evaluation of multiple vision tasks and fostering future research.
Contribution
It presents a novel benchmark dataset collected from simulated environments without source code access, closely matching real-world scene composition and realism validated through perceptual tests.
Findings
State-of-the-art methods show varying performance across tasks.
Benchmark data closely matches real-world environments.
Challenges for future research are identified.
Abstract
We present a benchmark suite for visual perception. The benchmark is based on more than 250K high-resolution video frames, all annotated with ground-truth data for both low-level and high-level vision tasks, including optical flow, semantic instance segmentation, object detection and tracking, object-level 3D scene layout, and visual odometry. Ground-truth data for all tasks is available for every frame. The data was collected while driving, riding, and walking a total of 184 kilometers in diverse ambient conditions in a realistic virtual world. To create the benchmark, we have developed a new approach to collecting ground-truth data from simulated worlds without access to their source code or content. We conduct statistical analyses that show that the composition of the scenes in the benchmark closely matches the composition of corresponding physical environments. The realism of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Playing for Benchmarks· youtube
