Bringing Background into the Foreground: Making All Classes Equal in   Weakly-supervised Video Semantic Segmentation

Fatemeh Sadat Saleh; Mohammad Sadegh Aliakbarian; Mathieu Salzmann,; Lars Petersson; Jose M. Alvarez

arXiv:1708.04400·cs.CV·August 16, 2017

Bringing Background into the Foreground: Making All Classes Equal in Weakly-supervised Video Semantic Segmentation

Fatemeh Sadat Saleh, Mohammad Sadegh Aliakbarian, Mathieu Salzmann,, Lars Petersson, Jose M. Alvarez

PDF

TL;DR

This paper presents a novel weakly-supervised video semantic segmentation method that utilizes classifier heatmaps and a two-stream architecture to effectively distinguish multiple background classes, achieving state-of-the-art results on urban and YouTube datasets.

Contribution

The paper introduces a new approach combining classifier heatmaps with a two-stream appearance and motion architecture for multi-background class segmentation.

Findings

01

Achieves state-of-the-art results on urban scene datasets.

02

Demonstrates the effectiveness of classifier heatmaps in weakly-supervised segmentation.

03

Shows benefits of joint appearance-motion modeling in challenging videos.

Abstract

Pixel-level annotations are expensive and time-consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recent years have seen great progress in weakly-supervised semantic segmentation, whether from a single image or from videos. However, most existing methods are designed to handle a single background class. In practical applications, such as autonomous navigation, it is often crucial to reason about multiple background classes. In this paper, we introduce an approach to doing so by making use of classifier heatmaps. We then develop a two-stream deep architecture that jointly leverages appearance and motion, and design a loss based on our heatmaps to train it. Our experiments demonstrate the benefits of our classifier heatmaps and of our two-stream architecture on challenging urban scene datasets and on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.