An Inverse Partial Optimal Transport Framework for Music-guided Movie Trailer Generation
Yutong Wang, Sidan Zhu, Hongteng Xu, Dixin Luo

TL;DR
This paper introduces an inverse partial optimal transport framework that leverages multi-modal representations to generate movie trailers guided by music, improving both visual appeal and quantitative metrics.
Contribution
The study proposes a novel IPOT framework for music-guided trailer generation, integrating multi-modal latent representations and a bi-level optimization strategy.
Findings
IPOT outperforms existing methods in subjective visual quality.
The framework effectively matches visual and audio modalities.
Experimental results demonstrate superior quantitative metrics.
Abstract
Trailer generation is a challenging video clipping task that aims to select highlighting shots from long videos like movies and re-organize them in an attractive way. In this study, we propose an inverse partial optimal transport (IPOT) framework to achieve music-guided movie trailer generation. In particular, we formulate the trailer generation task as selecting and sorting key movie shots based on audio shots, which involves matching the latent representations across visual and acoustic modalities. We learn a multi-modal latent representation model in the proposed IPOT framework to achieve this aim. In this framework, a two-tower encoder derives the latent representations of movie and music shots, respectively, and an attention-assisted Sinkhorn matching network parameterizes the grounding distance between the shots' latent representations and the distribution of the movie shots.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Music Technology and Sound Studies · Computer Graphics and Visualization Techniques
