Hierarchical Label Inference for Video Classification

Nelson Nauata; Jonathan Smith; Greg Mori

arXiv:1706.05028·cs.CV·January 23, 2018·2 cites

Hierarchical Label Inference for Video Classification

Nelson Nauata, Jonathan Smith, Greg Mori

PDF

Open Access

TL;DR

This paper introduces the use of Bidirectional Inference Neural Networks (BINN) to leverage hierarchical label structures for improved large-scale video classification, demonstrating significant performance gains on YouTube-8M datasets.

Contribution

It proposes a novel graph-based inference method using BINN that exploits label hierarchy for better video classification accuracy.

Findings

01

BINN outperforms baseline models on YouTube-8M datasets.

02

Hierarchical label inference improves classification performance.

03

BINN effectively captures label dependencies at multiple levels.

Abstract

Videos are a rich source of high-dimensional structured data, with a wide range of interacting components at varying levels of granularity. In order to improve understanding of unconstrained internet videos, it is important to consider the role of labels at separate levels of abstraction. In this paper, we consider the use of the Bidirectional Inference Neural Network (BINN) for performing graph-based inference in label space for the task of video classification. We take advantage of the inherent hierarchy between labels at increasing granularity. The BINN is evaluated on the first and second release of the YouTube-8M large scale multilabel video dataset. Our results demonstrate the effectiveness of BINN, achieving significant improvements against baseline models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Music and Audio Processing · Video Analysis and Summarization