D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
Jingyu Liu, Minquan Wang, Ye Ma, Bo Wang, Aozhu Chen, Quan Chen, Peng, Jiang, Xirong Li

TL;DR
This paper introduces D&M, a unified approach for detecting key moments in e-commerce videos and matching them with appropriate sound effects, enhancing user engagement and creating a new dataset for this task.
Contribution
The paper presents a novel unified method for simultaneous key moment detection and SFX matching, along with a large-scale dataset for e-commerce videos.
Findings
D&M outperforms baseline methods in key moment detection and SFX matching.
The new dataset SFX-Moment enables comprehensive evaluation.
Unified approach improves the quality of video decoration with sound effects.
Abstract
Videos showcasing specific products are increasingly important for E-commerce. Key moments naturally exist as the first appearance of a specific product, presentation of its distinctive features, the presence of a buying link, etc. Adding proper sound effects (SFX) to these key moments, or video decoration with SFX (VDSFX), is crucial for enhancing the user engaging experience. Previous studies about adding SFX to videos perform video to SFX matching at a holistic level, lacking the ability of adding SFX to a specific moment. Meanwhile, previous studies on video highlight detection or video moment retrieval consider only moment localization, leaving moment to SFX matching untouched. By contrast, we propose in this paper D&M, a unified method that accomplishes key moment detection and moment to SFX matching simultaneously. Moreover, for the new VDSFX task we build a large-scale dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVideo Analysis and Summarization
