Highlight Timestamp Detection Model for Comedy Videos via Multimodal   Sentiment Analysis

Fan Huang

arXiv:2106.00451·cs.CV·June 2, 2021

Highlight Timestamp Detection Model for Comedy Videos via Multimodal Sentiment Analysis

Fan Huang

PDF

Open Access

TL;DR

This paper introduces a multimodal sentiment analysis model to detect highlight timestamps in comedy videos, addressing the challenge of understanding abstract and contextual humor features beyond basic object recognition.

Contribution

It proposes a novel multimodal structure combining video, audio, and text data for improved comedy highlight detection, achieving state-of-the-art performance.

Findings

01

Achieved high accuracy in comedy highlight detection

02

Identified key multimodal features for humor recognition

03

Compared multiple models to find the most effective approach

Abstract

Nowadays, the videos on the Internet are prevailing. The precise and in-depth understanding of the videos is a difficult but valuable problem for both platforms and researchers. The existing video understand models do well in object recognition tasks but currently still cannot understand the abstract and contextual features like highlight humor frames in comedy videos. The current industrial works are also mainly focused on the basic category classification task based on the appearances of objects. The feature detection methods for the abstract category remains blank. A data structure that includes the information of video frames, audio spectrum and texts provide a new direction to explore. The multimodal models are proposed to make this in-depth video understanding mission possible. In this paper, we analyze the difficulties in abstract understanding of videos and propose a multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Human Pose and Action Recognition