Themes Informed Audio-visual Correspondence Learning

Runze Su; Fei Tao; Xudong Liu; Haoran Wei; Xiaorong Mei; Zhiyao Duan,; Lei Yuan; Ji Liu; Yuying Xie

arXiv:2009.06573·cs.AI·October 20, 2020·5 cites

Themes Informed Audio-visual Correspondence Learning

Runze Su, Fei Tao, Xudong Liu, Haoran Wei, Xiaorong Mei, Zhiyao Duan,, Lei Yuan, Ji Liu, Yuying Xie

PDF

Open Access

TL;DR

This paper introduces a novel theme-informed framework for audio-visual correspondence learning tailored for short-term user-generated videos, demonstrating significant improvements on a large new dataset of advertisement videos.

Contribution

The paper proposes new principles and a framework that incorporate video themes into AVC learning, along with releasing a large annotated corpus for evaluation.

Findings

01

Outperformed baseline by 23.15% absolute difference

02

Introduced a new large-scale UGV dataset with 85,432 videos

03

Demonstrated effectiveness of theme-informed AVC approach

Abstract

The applications of short-term user-generated video (UGV), such as Snapchat, and Youtube short-term videos, booms recently, raising lots of multimodal machine learning tasks. Among them, learning the correspondence between audio and visual information from videos is a challenging one. Most previous work of the audio-visual correspondence(AVC) learning only investigated constrained videos or simple settings, which may not fit the application of UGV. In this paper, we proposed new principles for AVC and introduced a new framework to set sight of videos' themes to facilitate AVC learning. We also released the KWAI-AD-AudVis corpus which contained 85432 short advertisement videos (around 913 hours) made by users. We evaluated our proposed approach on this corpus, and it was able to outperform the baseline by 23.15% absolute difference.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media