Visuomotor Understanding for Representation Learning of Driving Scenes

Seokju Lee; Junsik Kim; Tae-Hyun Oh; Yongseop Jeong; Donggeun Yoo,; Stephen Lin; In So Kweon

arXiv:1909.06979·cs.CV·September 17, 2019·6 cites

Visuomotor Understanding for Representation Learning of Driving Scenes

Seokju Lee, Junsik Kim, Tae-Hyun Oh, Yongseop Jeong, Donggeun Yoo,, Stephen Lin, In So Kweon

PDF

Open Access

TL;DR

This paper introduces a self-supervised learning approach using paired visual and sensor data from dashboard cameras to improve scene understanding in driving scenarios, outperforming existing unsupervised methods.

Contribution

It proposes a novel end-to-end framework that predicts dense optical flow from a single frame using paired sensing data, capturing semantic and geometric knowledge for driving scene representation.

Findings

01

Outperforms competing unsupervised representations on semantic segmentation

02

Learns semantic and geometric scene understanding from unlabeled data

03

Enhances downstream tasks requiring detailed scene comprehension

Abstract

Dashboard cameras capture a tremendous amount of driving scene video each day. These videos are purposefully coupled with vehicle sensing data, such as from the speedometer and inertial sensors, providing an additional sensing modality for free. In this work, we leverage the large-scale unlabeled yet naturally paired data for visual representation learning in the driving scenario. A representation is learned in an end-to-end self-supervised framework for predicting dense optical flow from a single frame with paired sensing data. We postulate that success on this task requires the network to learn semantic and geometric knowledge in the ego-centric view. For example, forecasting a future view to be seen from a moving vehicle requires an understanding of scene depth, scale, and movement of objects. We demonstrate that our learned representation can benefit other tasks that require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Remote Sensing and LiDAR Applications