Ad-Net: Audio-Visual Convolutional Neural Network for Advertisement Detection In Videos
Shervin Minaee, Imed Bouazizi, Prakash Kolan, Hossein Najafzadeh

TL;DR
This paper introduces Ad-Net, a two-stream audio-visual CNN that detects commercials in videos by analyzing both audio and visual content, enabling personalized advertisement replacement.
Contribution
The paper presents a novel two-stream CNN architecture that effectively combines audio and visual information for commercial detection in videos, outperforming models with hand-crafted features.
Findings
The model achieved significantly higher accuracy than previous methods.
Using both audio and visual data improves detection performance.
The dataset included over 50,000 video and commercial shots.
Abstract
Personalized advertisement is a crucial task for many of the online businesses and video broadcasters. Many of today's broadcasters use the same commercial for all customers, but as one can imagine different viewers have different interests and it seems reasonable to have customized commercial for different group of people, chosen based on their demographic features, and history. In this project, we propose a framework, which gets the broadcast videos, analyzes them, detects the commercial and replaces it with a more suitable commercial. We propose a two-stream audio-visual convolutional neural network, that one branch analyzes the visual information and the other one analyzes the audio information, and then the audio and visual embedding are fused together, and are used for commercial detection, and content categorization. We show that using both the visual and audio content of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis
