Fine-Tuning Open Video Generators for Cinematic Scene Synthesis: A Small-Data Pipeline with LoRA and Wan2.1 I2V
Meftun Akarsu, Kerem Catay, Sedat Bin Vedat, Enes Kutay Yarkan, Ilke Senturk, Arda Sar, Dafne Eksioglu

TL;DR
This paper introduces a practical, efficient pipeline for fine-tuning open-source video diffusion models to generate cinematic scenes from small datasets, combining LoRA adaptation and a two-stage process for style and motion synthesis.
Contribution
The paper presents a novel two-stage fine-tuning pipeline using LoRA modules and a video decoder, enabling cinematic scene synthesis from small datasets with high fidelity and temporal coherence.
Findings
Improved cinematic fidelity and temporal stability demonstrated by quantitative metrics.
Efficient domain transfer achieved within hours on a single GPU.
Pipeline supports reproducibility and adaptation across cinematic domains.
Abstract
We present a practical pipeline for fine-tuning open-source video diffusion transformers to synthesize cinematic scenes for television and film production from small datasets. The proposed two-stage process decouples visual style learning from motion generation. In the first stage, Low-Rank Adaptation (LoRA) modules are integrated into the cross-attention layers of the Wan2.1 I2V-14B model to adapt its visual representations using a compact dataset of short clips from Ay Yapim's historical television film El Turco. This enables efficient domain transfer within hours on a single GPU. In the second stage, the fine-tuned model produces stylistically consistent keyframes that preserve costume, lighting, and color grading, which are then temporally expanded into coherent 720p sequences through the model's video decoder. We further apply lightweight parallelization and sequence partitioning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Human Motion and Animation
