Constrained-size Tensorflow Models for YouTube-8M Video Understanding   Challenge

Tianqi Liu; Bo Liu

arXiv:1808.06739·cs.CV·November 12, 2018

Constrained-size Tensorflow Models for YouTube-8M Video Understanding Challenge

Tianqi Liu, Bo Liu

PDF

Open Access 2 Repos

TL;DR

This paper describes a constrained-size, ensemble-based TensorFlow model for YouTube-8M video classification, achieving high accuracy with significant compression, and builds upon the Gated NetVLAD architecture.

Contribution

It introduces a compressed, ensemble approach using float16 precision for efficient video classification in a competitive setting.

Findings

01

Achieved 88.324% GAP on private leaderboard

02

Realized 48.5% model size reduction with no accuracy loss

03

Utilized ensemble of four models based on Gated NetVLAD architecture

Abstract

This paper presents our 7th place solution to the second YouTube-8M video understanding competition which challenges participates to build a constrained-size model to classify millions of YouTube videos into thousands of classes. Our final model consists of four single models aggregated into one tensorflow graph. For each single model, we use the same network architecture as in the winning solution of the first YouTube-8M video understanding competition, namely Gated NetVLAD. We train the single models separately in tensorflow's default float32 precision, then replace weights with float16 precision and ensemble them in the evaluation and inference stages., achieving 48.5% compression rate without loss of precision. Our best model achieved 88.324% GAP on private leaderboard. The code is publicly available at https://github.com/boliu61/youtube-8m

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning