TL;DR
This paper introduces a novel, efficient CNN-based method with weak supervision for semantic foreground inpainting, outperforming previous methods on Cityscapes and KITTI datasets.
Contribution
It proposes a single-stage CNN with a max-pooling inpainting module trained without manual background annotations, improving efficiency and accuracy.
Findings
Outperforms previous two-stage methods by 3% IoU on Cityscapes.
Achieves 6% IoU improvement on unseen KITTI dataset.
Provides code and datasets for community use.
Abstract
Semantic scene understanding is an essential task for self-driving vehicles and mobile robots. In our work, we aim to estimate a semantic segmentation map, in which the foreground objects are removed and semantically inpainted with background classes, from a single RGB image. This semantic foreground inpainting task is performed by a single-stage convolutional neural network (CNN) that contains our novel max-pooling as inpainting (MPI) module, which is trained with weak supervision, i.e., it does not require manual background annotations for the foreground regions to be inpainted. Our approach is inherently more efficient than the previous two-stage state-of-the-art method, and outperforms it by a margin of 3% IoU for the inpainted foreground regions on Cityscapes. The performance margin increases to 6% IoU, when tested on the unseen KITTI dataset. The code and the manually annotated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
