AdSum: Two-stream Audio-visual Summarization for Automated Video Advertisement Clipping

Wen Xie; Yanjun Zhu; Gijs Overgoor; Yakov Bart; Agata Lapedriza Garcia; Sarah Ostadabbas

arXiv:2510.26569·cs.CV·December 18, 2025

AdSum: Two-stream Audio-visual Summarization for Automated Video Advertisement Clipping

Wen Xie, Yanjun Zhu, Gijs Overgoor, Yakov Bart, Agata Lapedriza Garcia, Sarah Ostadabbas

PDF

TL;DR

This paper presents AdSum, a novel two-stream audio-visual model for automated video advertisement clipping, framing it as a shot selection problem and demonstrating superior performance over existing methods.

Contribution

Introduces a new two-stream audio-visual fusion model and a dedicated ad-specific dataset, AdSum204, for automated video ad clipping tailored to advertising needs.

Findings

01

Model outperforms state-of-the-art methods on multiple metrics

02

Developed the first ad-specific dataset for video summarization

03

Demonstrated the importance of audio in ad video summarization

Abstract

Advertisers commonly need multiple versions of the same advertisement (ad) at varying durations for a single campaign. The traditional approach involves manually selecting and re-editing shots from longer video ads to create shorter versions, which is labor-intensive and time-consuming. In this paper, we introduce a framework for automated video ad clipping using video summarization techniques. We are the first to frame video clipping as a shot selection problem, tailored specifically for advertising. Unlike existing general video summarization methods that primarily focus on visual content, our approach emphasizes the critical role of audio in advertising. To achieve this, we develop a two-stream audio-visual fusion model that predicts the importance of video frames, where importance is defined as the likelihood of a frame being selected in the firm-produced short ad. To address the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.