Virtual Worlds as Proxy for Multi-Object Tracking Analysis
Adrien Gaidon, Qiao Wang, Yohann Cabon, Eleonora Vig

TL;DR
This paper introduces a method to generate realistic virtual worlds for training and evaluating computer vision algorithms, demonstrating that virtual data can effectively substitute real data for object tracking and scene understanding tasks.
Contribution
It presents a novel real-to-virtual world cloning approach and releases a new labeled dataset, showing virtual data enhances deep learning performance and allows controlled testing of environmental factors.
Findings
Pre-trained models perform similarly in real and virtual worlds.
Pre-training on virtual data improves real-world performance.
Weather and imaging conditions significantly impact tracking accuracy.
Abstract
Modern computer vision algorithms typically require expensive data acquisition and accurate manual labeling. In this work, we instead leverage the recent progress in computer graphics to generate fully labeled, dynamic, and photo-realistic proxy virtual worlds. We propose an efficient real-to-virtual world cloning method, and validate our approach by building and publicly releasing a new video dataset, called Virtual KITTI (see http://www.xrce.xerox.com/Research-Development/Computer-Vision/Proxy-Virtual-Worlds), automatically labeled with accurate ground truth for object detection, tracking, scene and instance segmentation, depth, and optical flow. We provide quantitative experimental evidence suggesting that (i) modern deep learning algorithms pre-trained on real data behave similarly in real and virtual worlds, and (ii) pre-training on virtual data improves performance. As the gap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Visual Attention and Saliency Detection
