Reliable Shot Identification for Complex Event Detection via Visual-Semantic Embedding
Minnan Luo, Xiaojun Chang, Chen Gong

TL;DR
This paper presents a novel visual-semantic embedding approach for complex event detection in videos, utilizing reliability modeling of video segments and curriculum learning to improve detection accuracy on benchmark datasets.
Contribution
It introduces a reliability-aware multiple instance learning framework with a visual-semantic guided loss and negative elastic-net regularization for robust event detection.
Findings
Outperforms baseline algorithms on TRECVID datasets
Effectively models segment reliability for improved detection
Demonstrates robustness in complex event scenarios
Abstract
Multimedia event detection is the task of detecting a specific event of interest in an user-generated video on websites. The most fundamental challenge facing this task lies in the enormously varying quality of the video as well as the high-level semantic abstraction of event inherently. In this paper, we decompose the video into several segments and intuitively model the task of complex event detection as a multiple instance learning problem by representing each video as a "bag" of segments in which each segment is referred to as an instance. Instead of treating the instances equally, we associate each instance with a reliability variable to indicate its importance and then select reliable instances for training. To measure the reliability of the varying instances precisely, we propose a visual-semantic guided loss by exploiting low-level feature from visual information together with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Video Surveillance and Tracking Methods
