Real-time Joint Object Detection and Semantic Segmentation Network for Automated Driving
Ganesh Sistu, Isabelle Leang, Senthil Yogamani

TL;DR
This paper introduces a real-time multi-task CNN that jointly performs object detection and semantic segmentation for automated driving, sharing an encoder to improve efficiency on embedded systems.
Contribution
A novel joint network architecture combining object detection and segmentation with shared encoder, optimized for real-time performance on low-power embedded hardware.
Findings
Achieves 30 fps on 1280x384 images
Maintains accuracy comparable to separate networks
Validated on KITTI, Cityscapes, and private datasets
Abstract
Convolutional Neural Networks (CNN) are successfully used for various visual perception tasks including bounding box object detection, semantic segmentation, optical flow, depth estimation and visual SLAM. Generally these tasks are independently explored and modeled. In this paper, we present a joint multi-task network design for learning object detection and semantic segmentation simultaneously. The main motivation is to achieve real-time performance on a low power embedded SOC by sharing of encoder for both the tasks. We construct an efficient architecture using a small ResNet10 like encoder which is shared for both decoders. Object detection uses YOLO v2 like decoder and semantic segmentation uses FCN8 like decoder. We evaluate the proposed network in two public datasets (KITTI, Cityscapes) and in our private fisheye camera dataset, and demonstrate that joint network provides the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · CCD and CMOS Imaging Sensors
