Large-Scale Video Classification with Feature Space Augmentation coupled   with Learned Label Relations and Ensembling

Choongyeun Cho; Benjamin Antin; Sanchit Arora; Shwan Ashrafi; Peilin; Duan; Dang The Huynh; Lee James; Hang Tuan Nguyen; Mojtaba Solgi; Cuong Van; Than

arXiv:1809.07895·cs.CV·September 24, 2018

Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling

Choongyeun Cho, Benjamin Antin, Sanchit Arora, Shwan Ashrafi, Peilin, Duan, Dang The Huynh, Lee James, Hang Tuan Nguyen, Mojtaba Solgi, Cuong Van, Than

PDF

Open Access

TL;DR

This paper describes Axon AI's approach to large-scale video classification, combining feature space augmentation, learned label relations, and ensembling to achieve top performance in the YouTube-8M challenge.

Contribution

The paper introduces a novel combination of feature space augmentation, label relation regularization, and learned ensembling to improve video classification accuracy.

Findings

01

Achieved 88.733% GAP on private test set, ranking 3rd among 394 teams.

02

Effective use of feature space over/sub-sampling improved model robustness.

03

Ensembling and label relation regularization contributed to performance gains.

Abstract

This paper presents the Axon AI's solution to the 2nd YouTube-8M Video Understanding Challenge, achieving the final global average precision (GAP) of 88.733% on the private test set (ranked 3rd among 394 teams, not considering the model size constraint), and 87.287% using a model that meets size requirement. Two sets of 7 individual models belonging to 3 different families were trained separately. Then, the inference results on a training data were aggregated from these multiple models and fed to train a compact model that meets the model size requirement. In order to further improve performance we explored and employed data over/sub-sampling in feature space, an additional regularization term during training exploiting label relationship, and learned weights for ensembling different individual models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications