Learnable Pooling Methods for Video Classification

Sebastian Kmiec; Juhan Bae; Ruijian An

arXiv:1810.00530·cs.CV·October 2, 2018·1 cites

Learnable Pooling Methods for Video Classification

Sebastian Kmiec, Juhan Bae, Ruijian An

PDF

Open Access 1 Repo

TL;DR

This paper proposes learnable pooling methods with attention mechanisms for video classification, offering new architectures that achieve competitive accuracy within budget constraints, demonstrated on the YouTube-8M challenge.

Contribution

It introduces novel learnable pooling architectures using attention and function approximation for improved video descriptor aggregation.

Findings

01

Achieved state-of-the-art accuracy within budget constraints

02

Demonstrated effectiveness on YouTube-8M dataset

03

Provided open-source implementations

Abstract

We introduce modifications to state-of-the-art approaches to aggregating local video descriptors by using attention mechanisms and function approximations. Rather than using ensembles of existing architectures, we provide an insight on creating new architectures. We demonstrate our solutions in the "The 2nd YouTube-8M Video Understanding Challenge", by using frame-level video and audio descriptors. We obtain testing accuracy similar to the state of the art, while meeting budget constraints, and touch upon strategies to improve the state of the art. Model implementations are available in https://github.com/pomonam/LearnablePoolingMethods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pomonam/LearnablePoolingMethods
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques