Zero-Shot Event Detection by Multimodal Distributional Semantic   Embedding of Videos

Mohamed Elhoseiny; Jingen Liu; Hui Cheng; Harpreet Sawhney; Ahmed; Elgammal

arXiv:1512.00818·cs.CV·December 17, 2015

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

Mohamed Elhoseiny, Jingen Liu, Hui Cheng, Harpreet Sawhney, Ahmed, Elgammal

PDF

TL;DR

This paper introduces a novel zero-shot event detection approach that embeds multimodal video content into a distributional semantic space, enabling effective retrieval of videos based on free text event queries, and outperforms existing methods on a large benchmark.

Contribution

It is the first zero-shot event detection model based on distributional semantics that incorporates multimodal video embedding and relevance estimation for free text queries.

Findings

01

Outperforms state-of-the-art on TRECVID MED with higher MAP and ROC-AUC scores.

02

Enables fast retrieval of videos using only event title as query.

03

Demonstrates effective semantic embedding of multimodal video information.

Abstract

We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e.g., "changing a vehicle tire") based on their content. We embed videos into a distributional semantic space and then measure the similarity between videos and the event query in a free text form. We validated our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.