Dynamic Concept Composition for Zero-Example Event Detection
Xiaojun Chang, Yi Yang, Guodong Long, Chengqi Zhang and, Alexander G. Hauptmann

TL;DR
This paper introduces a zero-shot event detection method that dynamically composes concept classifiers based on semantic relevance and online video descriptions, enabling effective event detection without training examples.
Contribution
It proposes a novel approach to learn optimal weights for concept classifiers dynamically for each test video, improving zero-shot event detection performance.
Findings
Outperforms existing zero-shot event detection methods on TRECVID MED datasets.
Demonstrates the effectiveness of dynamic concept composition in real-world videos.
Achieves superior accuracy compared to fixed-weight approaches.
Abstract
In this paper, we focus on automatically detecting events in unconstrained videos without the use of any visual training exemplars. In principle, zero-shot learning makes it possible to train an event detection model based on the assumption that events (e.g. \emph{birthday party}) can be described by multiple mid-level semantic concepts (e.g. "blowing candle", "birthday cake"). Towards this goal, we first pre-train a bundle of concept classifiers using data from other sources. Then we evaluate the semantic correlation of each concept \wrt the event of interest and pick up the relevant concept classifiers, which are applied on all test videos to get multiple prediction score vectors. While most existing systems combine the predictions of the concept classifiers with fixed weights, we propose to learn the optimal weights of the concept classifiers for each testing video by exploring a set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Multimodal Machine Learning Applications
