Unsupervised Learning of Important Objects from First-Person Videos

Gedas Bertasius; Hyun Soo Park; Stella X. Yu; Jianbo Shi

arXiv:1611.05335·cs.CV·August 3, 2017·2 cites

Unsupervised Learning of Important Objects from First-Person Videos

Gedas Bertasius, Hyun Soo Park, Stella X. Yu, Jianbo Shi

PDF

Open Access 1 Repo

TL;DR

This paper introduces an unsupervised method for detecting important objects in first-person videos by jointly learning segmentation and recognition without requiring manual importance labels, using a novel Visual-Spatial Network architecture.

Contribution

The work presents a new unsupervised learning framework with a Visual-Spatial Network that detects important objects without human-provided importance annotations.

Findings

01

Achieves comparable or better results than supervised methods on two datasets.

02

Introduces a cross-pathway supervision scheme within the Visual-Spatial Network.

03

Demonstrates effective importance object detection without manual labels.

Abstract

A first-person camera, placed at a person's head, captures, which objects are important to the camera wearer. Most prior methods for this task learn to detect such important objects from the manually labeled first-person data in a supervised fashion. However, important objects are strongly related to the camera wearer's internal state such as his intentions and attention, and thus, only the person wearing the camera can provide the importance labels. Such a constraint makes the annotation process costly and limited in scalability. In this work, we show that we can detect important objects in first-person images without the supervision by the camera wearer or even third-person labelers. We formulate an important detection problem as an interplay between the 1) segmentation and 2) recognition agents. The segmentation agent first proposes a possible important object segmentation mask for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gberta/Visual-Spatial-Network
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications