Large-scale Video Classification guided by Batch Normalized LSTM   Translator

Jae Hyeon Yoo

arXiv:1707.04045·cs.CV·July 14, 2017·6 cites

Large-scale Video Classification guided by Batch Normalized LSTM Translator

Jae Hyeon Yoo

PDF

Open Access

TL;DR

This paper introduces a novel multi-label video classification method using Batch Normalized LSTM networks, treating labels as words, and demonstrates improved results on the large-scale Youtube-8M dataset.

Contribution

It proposes a new LSTM-based approach with stochastic gating and batch normalization for large-scale multi-label video classification, viewing labels as words.

Findings

01

Improved validation accuracy on Youtube-8M dataset.

02

Effective use of batch normalization in LSTM models.

03

Potential for combining with other classifiers.

Abstract

Youtube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it is a challenging task to classify a huge amount of multi-labels. By change of perspective, we propose a novel method by regarding labels as words. In details, we describe online learning approaches to multi-label video classification that are guided by deep recurrent neural networks for video to sentence translator. We designed the translator based on LSTMs and found out that a stochastic gating before the input of each LSTM cell can help us to design the structural details. In addition, we adopted batch normalizations into our models to improve our LSTM models. Since our models are feature extractors, they can be used with other classifiers. Finally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory