VideoGraph: Recognizing Minutes-Long Human Activities in Videos

Noureldien Hussein; Efstratios Gavves; Arnold W.M. Smeulders

arXiv:1905.05143·cs.CV·October 15, 2019·47 cites

VideoGraph: Recognizing Minutes-Long Human Activities in Videos

Noureldien Hussein, Efstratios Gavves, Arnold W.M. Smeulders

PDF

Open Access

TL;DR

VideoGraph is a novel graph-based approach that models minutes-long human activities in videos by learning temporal structure directly from data, outperforming existing methods on benchmark datasets.

Contribution

It introduces a fully data-driven graph representation for long-duration activities, capturing temporal dependencies without requiring node-level annotations.

Findings

01

Outperforms related methods on Epic-Kitchen and Breakfast datasets

02

Successfully models minutes-long temporal dependencies

03

Learns activity structure directly from video data

Abstract

Many human activities take minutes to unfold. To represent them, related works opt for statistical pooling, which neglects the temporal structure. Others opt for convolutional methods, as CNN and Non-Local. While successful in learning temporal concepts, they are short of modeling minutes-long temporal dependencies. We propose VideoGraph, a method to achieve the best of two worlds: represent minutes-long human activities and learn their underlying temporal structure. VideoGraph learns a graph-based representation for human activities. The graph, its nodes and edges are learned entirely from video datasets, making VideoGraph applicable to problems without node-level annotation. The result is improvements over related works on benchmarks: Epic-Kitchen and Breakfast. Besides, we demonstrate that VideoGraph is able to learn the temporal structure of human activities in minutes-long videos.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods