Video Ads Content Structuring by Combining Scene Confidence Prediction   and Tagging

Tomoyuki Suzuki; Antonio Tejero-de-Pablos

arXiv:2108.09215·cs.CV·August 23, 2021·1 cites

Video Ads Content Structuring by Combining Scene Confidence Prediction and Tagging

Tomoyuki Suzuki, Antonio Tejero-de-Pablos

PDF

Open Access

TL;DR

This paper introduces a two-stage method for structuring video ads by detecting scene boundaries and tagging scenes using multimodal data, significantly improving accuracy on a challenging dataset.

Contribution

The paper presents a novel two-stage approach combining scene boundary detection with confidence scoring and multimodal scene tagging for video ads.

Findings

01

Improved segmentation accuracy over baselines

02

Effective use of multimodal data for scene tagging

03

Enhanced performance on Tencent Advertisement Video dataset

Abstract

Video ads segmentation and tagging is a challenging task due to two main reasons: (1) the video scene structure is complex and (2) it includes multiple modalities (e.g., visual, audio, text.). While previous work focuses mostly on activity videos (e.g. "cooking", "sports"), it is not clear how they can be leveraged to tackle the task of video ads content structuring. In this paper, we propose a two-stage method that first provides the boundaries of the scenes, and then combines a confidence score for each segmented scene and the tag classes predicted for that scene. We provide extensive experimental results on the network architectures and modalities used for the proposed method. Our combined method improves the previous baselines on the challenging "Tencent Advertisement Video" dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques