TL;DR
FilMaster is an AI system that integrates cinematic principles and professional workflows to generate high-quality, engaging films with realistic camera language and cinematic rhythm, advancing AI-driven film production.
Contribution
The paper introduces FilMaster, a novel end-to-end AI system that incorporates real-world cinematic principles and professional post-production workflows for automated film generation.
Findings
Superior camera language generation guided by reference clips
Effective cinematic rhythm control through audience-centric modules
Outperforms existing methods in producing professional-quality films
Abstract
AI-driven content creation has shown potential in film production. However, existing film generation systems struggle to implement cinematic principles and thus fail to generate professional-quality films, particularly lacking diverse camera language and cinematic rhythm. This results in templated visuals and unengaging narratives. To address this, we introduce FilMaster, an end-to-end AI system that integrates real-world cinematic principles for professional-grade film generation, yielding editable, industry-standard outputs. FilMaster is built on two key principles: (1) learning cinematography from extensive real-world film data and (2) emulating professional, audience-centric post-production workflows. Inspired by these principles, FilMaster incorporates two stages: a Reference-Guided Generation Stage which transforms user input to video clips, and a Generative Post-Production Stage…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper tackles an important and underexplored area—bridging cinematic principles and AI-based film generation. The topic is both academically relevant and practically impactful. 2. The authors explicitly ground their work in film principles (camera language, cinematic rhythm, audience perception, etc.) and emulate professional filmmaking workflows. This fills an existing academic gap between generative modeling and film studies. 3. The scene-level retrieval and coordinated camera planning
1. While the system design is well-engineered, it lacks a deep technical innovation or theoretical insight at the algorithmic level. I would prefer to see more technically substantive modules rather than a purely workflow-oriented system. 2. The system’s multi-stage pipeline (retrieval -> shot planning -> rough cut -> audience feedback -> fine cut -> sound production) introduces fragility. The paper doesn’t investigate how errors propagate or how robust the system is when upstream stages fail. 3
- The overall method is easy to follow. - The visualizations in the supplementary materials offer clear qualitative support for the method's effectiveness.
**1. Limited Novelty** I do not typically raise concerns about novelty lightly, but I must state that the technical contribution of this paper is highly limited. At its core, the proposed method is a relatively straightforward application of RAG. Crucial generative capabilities, such as identity preservation and high-fidelity video synthesis, appear to be inherited from the underlying foundational models used, rather than being novel contributions of the FilMaster framework itself. While the ac
1. An automatic end-to-end clip-level video generation agent with impressive performance. 2. The Coordination Stage could edit the order and duration of the generated videos, which is reasonable as inter-clip videos do not have strict temporal order constraint. The audio fusion manner is also intuitive and suitable for agent like methods (retrieval and synchronized). 3. The whole paper is well writen and easy to understand.
The performance of this paper is great, my main concern lies on fair comparison with existing methods. This paper utilizes Kling 1.6 as the video generation model, which is much better than existing open-source model that previous method used, such as CogVideoX and LTX-Video. So is the performance improvement simply caused by the basic ability of Kling? It is recommended to have a fair comparison with MovieAgent to see the contribution of this paper.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
