ChatCam: Empowering Camera Control through Conversational AI
Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

TL;DR
ChatCam leverages conversational AI and a GPT-based model to enable intuitive, language-guided camera control for cinematic video creation, bridging AI perception with professional filmmaking workflows.
Contribution
Introduces ChatCam, a novel system combining language models and camera trajectory tools for natural language-driven camera control in video production.
Findings
Effective interpretation of complex user instructions.
Comparable or superior performance to state-of-the-art methods.
Positive user study feedback on usability and accuracy.
Abstract
Cinematographers adeptly capture the essence of the world, crafting compelling visual narratives through intricate camera movements. Witnessing the strides made by large language models in perceiving and interacting with the 3D world, this study explores their capability to control cameras with human language guidance. We introduce ChatCam, a system that navigates camera movements through conversations with users, mimicking a professional cinematographer's workflow. To achieve this, we propose CineGPT, a GPT-based autoregressive model for text-conditioned camera trajectory generation. We also develop an Anchor Determinator to ensure precise camera trajectory placement. ChatCam understands user requests and employs our proposed tools to generate trajectories, which can be used to render high-quality video footage on radiance field representations. Our experiments, including comparisons…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Digital Mental Health Interventions
