Compositional Structure Learning for Sequential Video Data

Kyoung-Woon On; Eun-Sol Kim; Yu-Jung Heo; Byoung-Tak Zhang

arXiv:1907.01709·cs.LG·July 4, 2019·1 cites

Compositional Structure Learning for Sequential Video Data

Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

PDF

Open Access

TL;DR

This paper introduces Temporal Dependency Networks (TDNs), a graph-based approach that captures complex, multilevel semantic temporal dependencies in videos, outperforming conventional sequential models.

Contribution

The paper proposes TDNs, a novel graph-based model that learns complex, multilevel temporal dependencies in videos, addressing limitations of traditional sequential learning methods.

Findings

01

Efficiently learns complex semantic structures in videos.

02

Outperforms conventional sequential models on Youtube-8M.

03

Demonstrates effectiveness of graph-based dependency learning.

Abstract

Conventional sequential learning methods such as Recurrent Neural Networks (RNNs) focus on interactions between consecutive inputs, i.e. first-order Markovian dependency. However, most of sequential data, as seen with videos, have complex temporal dependencies that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods. Here, we propose Temporal Dependency Networks (TDNs) for learning video data by discovering these complex structures of the videos. The TDNs represent video as a graph whose nodes and edges correspond to frames of the video and their dependencies respectively. Via a parameterized kernel with graph-cut and graph convolutions, the TDNs find compositional temporal dependencies of the data in multilevel graph forms. We evaluate the proposed method on the large-scale video dataset Youtube-8M. The experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications