An Automatic Deep Learning Approach for Trailer Generation through Large Language Models
Roberto Balestri, Pasquale Cascarano, Mirko Degli Esposti, Guglielmo Pescatore

TL;DR
This paper introduces an automated multimodal framework utilizing large language models to generate engaging movie trailers by selecting key scenes, quotes, and creating audio-visual content, outperforming previous methods.
Contribution
The novel integration of large language models into an automated trailer generation framework for multimodal content creation is presented.
Findings
Generated trailers are more visually appealing than previous state-of-the-art methods.
The framework effectively selects key visual sequences and quotes aligned with the movie's narrative.
Automated creation of music backgrounds and voiceovers enhances audience engagement.
Abstract
Trailers are short promotional videos designed to provide audiences with a glimpse of a movie. The process of creating a trailer typically involves selecting key scenes, dialogues and action sequences from the main content and editing them together in a way that effectively conveys the tone, theme and overall appeal of the movie. This often includes adding music, sound effects, visual effects and text overlays to enhance the impact of the trailer. In this paper, we present a framework exploiting a comprehensive multimodal strategy for automated trailer production. Also, a Large Language Model (LLM) is adopted across various stages of the trailer creation. First, it selects main key visual sequences that are relevant to the movie's core narrative. Then, it extracts the most appealing quotes from the movie, aligning them with the trailer's narrative. Additionally, the LLM assists in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
