An Automatic Deep Learning Approach for Trailer Generation through Large Language Models

Roberto Balestri; Pasquale Cascarano; Mirko Degli Esposti; Guglielmo Pescatore

arXiv:2601.23121·cs.MM·February 2, 2026

An Automatic Deep Learning Approach for Trailer Generation through Large Language Models

Roberto Balestri, Pasquale Cascarano, Mirko Degli Esposti, Guglielmo Pescatore

PDF

Open Access

TL;DR

This paper introduces an automated multimodal framework utilizing large language models to generate engaging movie trailers by selecting key scenes, quotes, and creating audio-visual content, outperforming previous methods.

Contribution

The novel integration of large language models into an automated trailer generation framework for multimodal content creation is presented.

Findings

01

Generated trailers are more visually appealing than previous state-of-the-art methods.

02

The framework effectively selects key visual sequences and quotes aligned with the movie's narrative.

03

Automated creation of music backgrounds and voiceovers enhances audience engagement.

Abstract

Trailers are short promotional videos designed to provide audiences with a glimpse of a movie. The process of creating a trailer typically involves selecting key scenes, dialogues and action sequences from the main content and editing them together in a way that effectively conveys the tone, theme and overall appeal of the movie. This often includes adding music, sound effects, visual effects and text overlays to enhance the impact of the trailer. In this paper, we present a framework exploiting a comprehensive multimodal strategy for automated trailer production. Also, a Large Language Model (LLM) is adopted across various stages of the trailer creation. First, it selects main key visual sequences that are relevant to the movie's core narrative. Then, it extracts the most appealing quotes from the movie, aligning them with the trailer's narrative. Additionally, the LLM assists in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis