Unsupervised Semantic Action Discovery from Video Collections

Ozan Sener; Amir Roshan Zamir; Chenxia Wu; Silvio Savarese; and Ashutosh Saxena

arXiv:1605.03324·cs.CV·May 12, 2016·2 cites

Unsupervised Semantic Action Discovery from Video Collections

Ozan Sener, Amir Roshan Zamir, Chenxia Wu, Silvio Savarese, and Ashutosh Saxena

PDF

Open Access

TL;DR

This paper introduces an unsupervised method to parse instructional videos into semantic steps, creating a storyline with descriptions using visual and language cues, applicable to large-scale YouTube videos.

Contribution

It presents a novel unsupervised approach combining visual and language data to discover semantic steps and generate descriptions in instructional videos.

Findings

01

Successfully discovers semantically correct instructions

02

Works on large-scale YouTube videos

03

Provides textual descriptions for each step

Abstract

Human communication takes many forms, including speech, text and instructional videos. It typically has an underlying structure, with a starting point, ending, and certain objective steps between them. In this paper, we consider instructional videos where there are tens of millions of them on the Internet. We propose a method for parsing a video into such semantic steps in an unsupervised way. Our method is capable of providing a semantic "storyline" of the video composed of its objective steps. We accomplish this using both visual and language cues in a joint generative model. Our method can also provide a textual description for each of the identified semantic steps and video segments. We evaluate our method on a large number of complex YouTube videos and show that our method discovers semantically correct instructions for a variety of tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Human Motion and Animation