Fast Scene Understanding for Autonomous Driving

Davy Neven; Bert De Brabandere; Stamatios Georgoulis; Marc Proesmans,; Luc Van Gool

arXiv:1708.02550·cs.CV·August 10, 2017·49 cites

Fast Scene Understanding for Autonomous Driving

Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans,, Luc Van Gool

PDF

Open Access 1 Repo

TL;DR

This paper introduces a real-time, efficient multi-task architecture based on ENet for autonomous driving that simultaneously performs semantic scene segmentation, instance segmentation, and monocular depth estimation at 21 fps without losing accuracy.

Contribution

It presents a unified, multi-task deep learning model that achieves real-time performance for three critical autonomous driving perception tasks using a shared encoder architecture.

Findings

01

Runs at 21 fps on Cityscapes dataset at 1024x512 resolution

02

Maintains accuracy comparable to single-task models

03

Efficient multi-task approach suitable for real-time autonomous driving

Abstract

Most approaches for instance-aware semantic labeling traditionally focus on accuracy. Other aspects like runtime and memory footprint are arguably as important for real-time applications such as autonomous driving. Motivated by this observation and inspired by recent works that tackle multiple tasks with a single integrated architecture, in this paper we present a real-time efficient implementation based on ENet that solves three autonomous driving related tasks at once: semantic scene segmentation, instance segmentation and monocular depth estimation. Our approach builds upon a branched ENet architecture with a shared encoder but different decoder branches for each of the three tasks. The presented method can run at 21 fps at a resolution of 1024x512 on the Cityscapes dataset without sacrificing accuracy compared to running each task separately.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davyneven/fastSceneUnderstanding
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsDilated Convolution · 1x1 Convolution · Batch Normalization · Max Pooling · Convolution · ENet Dilated Bottleneck · ENet Bottleneck · ENet Initial Block · SpatialDropout · Parameterized ReLU