Self-Paced and Self-Corrective Masked Prediction for Movie Trailer Generation
Sidan Zhu, Hongteng Xu, Dixin Luo

TL;DR
This paper introduces SSMP, a novel self-paced, self-corrective masked prediction method using a Transformer encoder for automatic movie trailer generation, outperforming existing approaches through bi-directional contextual modeling and progressive self-correction.
Contribution
The paper proposes a new self-paced, self-corrective masked prediction framework for trailer generation, moving beyond the traditional selection-then-ranking paradigm with state-of-the-art results.
Findings
SSMP achieves superior trailer quality compared to existing methods.
The self-paced masking improves model adaptability and performance.
Progressive self-correction mimics human editing processes.
Abstract
As a challenging video editing task, movie trailer generation involves selecting and reorganizing movie shots to create engaging trailers. Currently, most existing automatic trailer generation methods employ a "selection-then-ranking" paradigm (i.e., first selecting key shots and then ranking them), which suffers from inevitable error propagation and limits the quality of the generated trailers. Beyond this paradigm, we propose a new self-paced and self-corrective masked prediction method called SSMP, which achieves state-of-the-art results in automatic trailer generation via bi-directional contextual modeling and progressive self-correction. In particular, SSMP trains a Transformer encoder that takes the movie shot sequences as prompts and generates corresponding trailer shot sequences accordingly. The model is trained via masked prediction, reconstructing each trailer shot sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
