Learning To Recognize Procedural Activities with Distant Supervision

Xudong Lin; Fabio Petroni; Gedas Bertasius; Marcus Rohrbach; Shih-Fu; Chang; Lorenzo Torresani

arXiv:2201.10990·cs.CV·June 20, 2022·1 cites

Learning To Recognize Procedural Activities with Distant Supervision

Xudong Lin, Fabio Petroni, Gedas Bertasius, Marcus Rohrbach, Shih-Fu, Chang, Lorenzo Torresani

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for classifying complex, multi-step activities in long videos by automatically identifying steps through distant supervision from a textual knowledge base, improving generalization across various tasks.

Contribution

The paper presents a novel approach that leverages distant supervision from wikiHow to automatically label steps in instructional videos, enabling training without manual annotations.

Findings

01

Models trained with automatically-labeled steps outperform baselines.

02

The approach generalizes well to multiple downstream tasks.

03

Automatic step identification improves activity recognition accuracy.

Abstract

In this paper we consider the problem of classifying fine-grained, multi-step activities (e.g., cooking different recipes, making disparate home improvements, creating various forms of arts and crafts) from long videos spanning up to several minutes. Accurately categorizing these activities requires not only recognizing the individual steps that compose the task but also capturing their temporal dependencies. This problem is dramatically different from traditional action classification, where models are typically optimized on videos that span only a few seconds and that are manually trimmed to contain simple atomic actions. While step annotations could enable the training of models to recognize the individual steps of procedural activities, existing large-scale datasets in this area do not include such segment labels due to the prohibitive cost of manually annotating temporal boundaries…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/video-distant-supervision
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Music and Audio Processing

MethodsBalanced Selection