Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos
Mohamed Elhoseiny, Jingen Liu, Hui Cheng, Harpreet Sawhney, Ahmed, Elgammal

TL;DR
This paper introduces a novel zero-shot event detection approach that embeds multimodal video content into a distributional semantic space, enabling effective retrieval of videos based on free text event queries, and outperforms existing methods on a large benchmark.
Contribution
It is the first zero-shot event detection model based on distributional semantics that incorporates multimodal video embedding and relevance estimation for free text queries.
Findings
Outperforms state-of-the-art on TRECVID MED with higher MAP and ROC-AUC scores.
Enables fast retrieval of videos using only event title as query.
Demonstrates effective semantic embedding of multimodal video information.
Abstract
We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e.g., "changing a vehicle tire") based on their content. We embed videos into a distributional semantic space and then measure the similarity between videos and the event query in a free text form. We validated our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
