Overview of Tencent Multi-modal Ads Video Understanding Challenge

Zhenzhi Wang; Liyu Wu; Zhimin Li; Jiangfeng Xiong; Qinglin Lu

arXiv:2109.07951·cs.CV·September 17, 2021·1 cites

Overview of Tencent Multi-modal Ads Video Understanding Challenge

Zhenzhi Wang, Liyu Wu, Zhimin Li, Jiangfeng Xiong, Qinglin Lu

PDF

Open Access

TL;DR

This paper introduces the Tencent Multi-modal Ads Video Understanding Challenge, focusing on temporal segmentation and multi-label classification of ads videos to advance understanding in this domain and support related applications.

Contribution

It presents a comprehensive challenge with a new dataset, evaluation protocol, and baseline, addressing multi-modal, temporal, and multi-label aspects unique to ads videos.

Findings

01

Baseline ablation reveals key challenges in multi-modal, temporal, and multi-label understanding.

02

The dataset and evaluation protocol facilitate future research in ads video understanding.

03

The challenge aims to improve video recommendation and related applications.

Abstract

Multi-modal Ads Video Understanding Challenge is the first grand challenge aiming to comprehensively understand ads videos. Our challenge includes two tasks: video structuring in the temporal dimension and multi-modal video classification. It asks the participants to accurately predict both the scene boundaries and the multi-label categories of each scene based on a fine-grained and ads-related category hierarchy. Therefore, our task has four distinguishing features from previous ones: ads domain, multi-modal information, temporal segmentation, and multi-label classification. It will advance the foundation of ads video understanding and have a significant impact on many ads applications like video recommendation. This paper presents an overview of our challenge, including the background of ads videos, an elaborate description of task and dataset, evaluation protocol, and our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Human Motion and Animation