Weakly supervised learning of actions from transcripts

Hilde Kuehne; Alexander Richard; Juergen Gall

arXiv:1610.02237·cs.CV·June 20, 2017·5 cites

Weakly supervised learning of actions from transcripts

Hilde Kuehne, Alexander Richard, Juergen Gall

PDF

Open Access

TL;DR

This paper introduces a weakly supervised learning method for human actions in videos using only transcripts, enabling action localization and classification without frame-level annotations, and demonstrates competitive results across multiple datasets.

Contribution

The authors propose a novel approach that infers action models from transcript sequences, eliminating the need for detailed frame annotations and improving transcript-video alignment.

Findings

01

Achieves competitive action localization and classification

02

Outperforms state-of-the-art transcript alignment methods

03

Effective across diverse activity datasets

Abstract

We present an approach for weakly supervised learning of human actions from video transcriptions. Our system is based on the idea that, given a sequence of input data and a transcript, i.e. a list of the order the actions occur in the video, it is possible to infer the actions within the video stream, and thus, learn the related action models without the need for any frame-based annotation. Starting from the transcript information at hand, we split the given data sequences uniformly based on the number of expected actions. We then learn action models for each class by maximizing the probability that the training video sequences are generated by the action models given the sequence order as defined by the transcripts. The learned model can be used to temporally segment an unseen video with or without transcript. We evaluate our approach on four distinct activity datasets, namely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications