Large-scale Video Classification guided by Batch Normalized LSTM Translator
Jae Hyeon Yoo

TL;DR
This paper introduces a novel multi-label video classification method using Batch Normalized LSTM networks, treating labels as words, and demonstrates improved results on the large-scale Youtube-8M dataset.
Contribution
It proposes a new LSTM-based approach with stochastic gating and batch normalization for large-scale multi-label video classification, viewing labels as words.
Findings
Improved validation accuracy on Youtube-8M dataset.
Effective use of batch normalization in LSTM models.
Potential for combining with other classifiers.
Abstract
Youtube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it is a challenging task to classify a huge amount of multi-labels. By change of perspective, we propose a novel method by regarding labels as words. In details, we describe online learning approaches to multi-label video classification that are guided by deep recurrent neural networks for video to sentence translator. We designed the translator based on LSTMs and found out that a stochastic gating before the input of each LSTM cell can help us to design the structural details. In addition, we adopted batch normalizations into our models to improve our LSTM models. Since our models are feature extractors, they can be used with other classifiers. Finally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
