Video SemNet: Memory-Augmented Video Semantic Network

Prashanth Vijayaraghavan; Deb Roy

arXiv:2011.10909·cs.CV·November 24, 2020

Video SemNet: Memory-Augmented Video Semantic Network

Prashanth Vijayaraghavan, Deb Roy

PDF

Open Access

TL;DR

Video SemNet introduces a memory-augmented neural network to encode semantic features in videos, enabling improved genre and rating predictions, thus capturing narrative elements and audience engagement.

Contribution

The paper presents a novel Memory-Augmented Video Semantic Network that effectively encodes semantic descriptors and learns video embeddings for narrative understanding.

Findings

01

Achieved 0.72 weighted F-1 score in genre prediction.

02

Achieved 0.63 weighted F-1 score in IMDB rating prediction.

03

Demonstrated the model's ability to measure audience engagement.

Abstract

Stories are a very compelling medium to convey ideas, experiences, social and cultural values. Narrative is a specific manifestation of the story that turns it into knowledge for the audience. In this paper, we propose a machine learning approach to capture the narrative elements in movies by bridging the gap between the low-level data representations and semantic aspects of the visual medium. We present a Memory-Augmented Video Semantic Network, called Video SemNet, to encode the semantic descriptors and learn an embedding for the video. The model employs two main components: (i) a neural semantic learner that learns latent embeddings of semantic descriptors and (ii) a memory module that retains and memorizes specific semantic patterns from the video. We evaluate the video representations obtained from variants of our model on two tasks: (a) genre prediction and (b) IMDB Rating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Human Pose and Action Recognition