Hierarchical Deep Recurrent Architecture for Video Understanding

Luming Tang; Boyang Deng; Haiyu Zhao; Shuai Yi

arXiv:1707.03296·cs.CV·July 12, 2017

Hierarchical Deep Recurrent Architecture for Video Understanding

Luming Tang, Boyang Deng, Haiyu Zhao, Shuai Yi

PDF

Open Access 1 Repo

TL;DR

This paper presents a hierarchical deep recurrent architecture for large-scale multi-label video classification, incorporating novel attention pooling methods and ensemble techniques, achieving high accuracy on the Youtube-8M benchmark.

Contribution

It introduces a hierarchical deep recurrent framework with attention pooling and ensemble methods for improved video understanding and classification performance.

Findings

01

Achieved 0.84346 GAP score on public test dataset.

02

Developed novel attention pooling methods for frame importance.

03

Ensemble of 18 models outperforms individual models.

Abstract

This paper introduces the system we developed for the Youtube-8M Video Understanding Challenge, in which a large-scale benchmark dataset was used for multi-label video classification. The proposed framework contains hierarchical deep architecture, including the frame-level sequence modeling part and the video-level classification part. In the frame-level sequence modelling part, we explore a set of methods including Pooling-LSTM (PLSTM), Hierarchical-LSTM (HLSTM), Random-LSTM (RLSTM) in order to address the problem of large amount of frames in a video. We also introduce two attention pooling methods, single attention pooling (ATT) and multiply attention pooling (Multi-ATT) so that we can pay more attention to the informative frames in a video and ignore the useless frames. In the video-level classification part, two methods are proposed to increase the classification performance, i.e.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Tsingularity/youtube-8m
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Music and Audio Processing