Encoding Video and Label Priors for Multi-label Video Classification on   YouTube-8M dataset

Seil Na; Youngjae Yu; Sangho Lee; Jisung Kim; Gunhee Kim

arXiv:1706.07960·cs.CV·July 13, 2017·6 cites

Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset

Seil Na, Youngjae Yu, Sangho Lee, Jisung Kim, Gunhee Kim

PDF

Open Access 1 Repo

TL;DR

This paper presents a deep neural network approach for multi-label video classification on the YouTube-8M dataset, addressing challenges like temporal modeling, label imbalance, and label correlations, achieving high performance.

Contribution

It introduces a novel neural network architecture with specific components and methods tailored for multi-label video classification on large-scale datasets.

Findings

01

Proposed models outperform baseline models significantly.

02

Ensemble approach achieved 8th place in Kaggle competition.

03

Effective handling of label correlations and imbalances.

Abstract

YouTube-8M is the largest video dataset for multi-label video classification. In order to tackle the multi-label classification on this challenging dataset, it is necessary to solve several issues such as temporal modeling of videos, label imbalances, and correlations between labels. We develop a deep neural network model, which consists of four components: the frame encoder, the classification layer, the label processing layer, and the loss function. We introduce our newly proposed methods and discusses how existing models operate in the YouTube-8M Classification Task, what insights they have, and why they succeed (or fail) to achieve good performance. Most of the models we proposed are very high compared to the baseline models, and the ensemble of the models we used is 8th in the Kaggle Competition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seilna/youtube-8m
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Video Analysis and Summarization · Multimodal Machine Learning Applications