Weakly Supervised Learning of Heterogeneous Concepts in Videos
Sohil Shah, Kuldeep Kulkarni, Arijit Biswas, Ankit Gandhi, Om Deshmukh, and Larry Davis

TL;DR
This paper introduces a generalized Indian Buffet Process model for weakly supervised learning in videos, enabling the classification and localization of heterogeneous concepts with location constraints, outperforming existing methods.
Contribution
It extends the IBP to handle heterogeneous concepts and location constraints in videos, providing a unified probabilistic framework for weakly supervised learning.
Findings
24% improvement in concept classification on Casablanca dataset
9% improvement in localization on A2D dataset
Effective integration of heterogeneous concepts and location constraints
Abstract
Typical textual descriptions that accompany online videos are 'weak': i.e., they mention the main concepts in the video but not their corresponding spatio-temporal locations. The concepts in the description are typically heterogeneous (e.g., objects, persons, actions). Certain location constraints on these concepts can also be inferred from the description. The goal of this paper is to present a generalization of the Indian Buffet Process (IBP) that can (a) systematically incorporate heterogeneous concepts in an integrated framework, and (b) enforce location constraints, for efficient classification and localization of the concepts in the videos. Finally, we develop posterior inference for the proposed formulation using mean-field variational approximation. Comparative evaluations on the Casablanca and the A2D datasets show that the proposed approach significantly outperforms other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
