Pixel Objectness: Learning to Segment Generic Objects Automatically in   Images and Videos

Bo Xiong; Suyog Dutt Jain; Kristen Grauman

arXiv:1808.04702·cs.CV·December 19, 2018·5 cites

Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos

Bo Xiong, Suyog Dutt Jain, Kristen Grauman

PDF

Open Access

TL;DR

This paper introduces an end-to-end deep learning framework that segments generic objects in images and videos, including unseen categories, by combining appearance and motion cues and leveraging weak annotations for training.

Contribution

It presents a novel structured prediction model that integrates appearance and motion for object segmentation and exploits weakly labeled data to improve training efficiency.

Findings

01

Achieves state-of-the-art results on multiple segmentation benchmarks.

02

Effectively segments unseen object categories in images and videos.

03

Enhances image retrieval and retargeting using high-quality foreground maps.

Abstract

We propose an end-to-end learning framework for segmenting generic objects in both images and videos. Given a novel image or video, our approach produces a pixel-level mask for all "object-like" regions---even for object categories never seen during training. We formulate the task as a structured prediction problem of assigning an object/background label to each pixel, implemented using a deep fully convolutional network. When applied to a video, our model further incorporates a motion stream, and the network learns to combine both appearance and motion and attempts to extract all prominent objects whether they are moving or not. Beyond the core model, a second contribution of our approach is how it leverages varying strengths of training annotations. Pixel-level annotations are quite difficult to obtain, yet crucial for training a deep network approach for segmentation. Thus we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications