Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos
Bo Xiong, Suyog Dutt Jain, Kristen Grauman

TL;DR
This paper introduces an end-to-end deep learning framework that segments generic objects in images and videos, including unseen categories, by combining appearance and motion cues and leveraging weak annotations for training.
Contribution
It presents a novel structured prediction model that integrates appearance and motion for object segmentation and exploits weakly labeled data to improve training efficiency.
Findings
Achieves state-of-the-art results on multiple segmentation benchmarks.
Effectively segments unseen object categories in images and videos.
Enhances image retrieval and retargeting using high-quality foreground maps.
Abstract
We propose an end-to-end learning framework for segmenting generic objects in both images and videos. Given a novel image or video, our approach produces a pixel-level mask for all "object-like" regions---even for object categories never seen during training. We formulate the task as a structured prediction problem of assigning an object/background label to each pixel, implemented using a deep fully convolutional network. When applied to a video, our model further incorporates a motion stream, and the network learns to combine both appearance and motion and attempts to extract all prominent objects whether they are moving or not. Beyond the core model, a second contribution of our approach is how it leverages varying strengths of training annotations. Pixel-level annotations are quite difficult to obtain, yet crucial for training a deep network approach for segmentation. Thus we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
