Reliable Shot Identification for Complex Event Detection via   Visual-Semantic Embedding

Minnan Luo; Xiaojun Chang; Chen Gong

arXiv:2110.08063·cs.CV·October 18, 2021

Reliable Shot Identification for Complex Event Detection via Visual-Semantic Embedding

Minnan Luo, Xiaojun Chang, Chen Gong

PDF

Open Access

TL;DR

This paper presents a novel visual-semantic embedding approach for complex event detection in videos, utilizing reliability modeling of video segments and curriculum learning to improve detection accuracy on benchmark datasets.

Contribution

It introduces a reliability-aware multiple instance learning framework with a visual-semantic guided loss and negative elastic-net regularization for robust event detection.

Findings

01

Outperforms baseline algorithms on TRECVID datasets

02

Effectively models segment reliability for improved detection

03

Demonstrates robustness in complex event scenarios

Abstract

Multimedia event detection is the task of detecting a specific event of interest in an user-generated video on websites. The most fundamental challenge facing this task lies in the enormously varying quality of the video as well as the high-level semantic abstraction of event inherently. In this paper, we decompose the video into several segments and intuitively model the task of complex event detection as a multiple instance learning problem by representing each video as a "bag" of segments in which each segment is referred to as an instance. Instead of treating the instances equally, we associate each instance with a reliability variable to indicate its importance and then select reliable instances for training. To measure the reliability of the varying instances precisely, we propose a visual-semantic guided loss by exploiting low-level feature from visual information together with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Video Surveillance and Tracking Methods