Hierarchical Label Inference for Video Classification
Nelson Nauata, Jonathan Smith, Greg Mori

TL;DR
This paper introduces the use of Bidirectional Inference Neural Networks (BINN) to leverage hierarchical label structures for improved large-scale video classification, demonstrating significant performance gains on YouTube-8M datasets.
Contribution
It proposes a novel graph-based inference method using BINN that exploits label hierarchy for better video classification accuracy.
Findings
BINN outperforms baseline models on YouTube-8M datasets.
Hierarchical label inference improves classification performance.
BINN effectively captures label dependencies at multiple levels.
Abstract
Videos are a rich source of high-dimensional structured data, with a wide range of interacting components at varying levels of granularity. In order to improve understanding of unconstrained internet videos, it is important to consider the role of labels at separate levels of abstraction. In this paper, we consider the use of the Bidirectional Inference Neural Network (BINN) for performing graph-based inference in label space for the task of video classification. We take advantage of the inherent hierarchy between labels at increasing granularity. The BINN is evaluated on the first and second release of the YouTube-8M large scale multilabel video dataset. Our results demonstrate the effectiveness of BINN, achieving significant improvements against baseline models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Music and Audio Processing · Video Analysis and Summarization
