BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object Detection for Autonomous Driving
Hazem Rashed, Mariam Essam, Maha Mohamed, Ahmad El Sallab, Senthil, Yogamani

TL;DR
This paper introduces BEV-MODNet, an end-to-end monocular camera-based system for detecting moving objects directly in Bird's Eye View space, along with a new dataset for training and evaluation.
Contribution
The work presents a novel dataset and a two-stream neural network architecture for direct BEV motion segmentation from monocular images, improving over traditional inverse perspective mapping methods.
Findings
Achieved a 13% improvement in mIoU over baseline methods.
Created a new dataset with 12.9k images and BEV annotations.
Demonstrated effective direct BEV motion segmentation from monocular input.
Abstract
Detection of moving objects is a very important task in autonomous driving systems. After the perception phase, motion planning is typically performed in Bird's Eye View (BEV) space. This would require projection of objects detected on the image plane to top view BEV plane. Such a projection is prone to errors due to lack of depth information and noisy mapping in far away areas. CNNs can leverage the global context in the scene to project better. In this work, we explore end-to-end Moving Object Detection (MOD) on the BEV map directly using monocular images as input. To the best of our knowledge, such a dataset does not exist and we create an extended KITTI-raw dataset consisting of 12.9k images with annotations of moving object masks in BEV space for five classes. The dataset is intended to be used for class agnostic motion cue based object detection and classes are provided as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
