PreciseCam: Precise Camera Control for Text-to-Image Generation

Edurne Bernal-Berdun; Ana Serrano; Belen Masia; Matheus Gadelha,; Yannick Hold-Geoffroy; Xin Sun; Diego Gutierrez

arXiv:2501.12910·cs.CV·January 23, 2025

PreciseCam: Precise Camera Control for Text-to-Image Generation

Edurne Bernal-Berdun, Ana Serrano, Belen Masia, Matheus Gadelha,, Yannick Hold-Geoffroy, Xin Sun, Diego Gutierrez

PDF

Open Access 1 Models

TL;DR

PreciseCam introduces a method for accurate camera control in text-to-image generation using only four camera parameters, supported by a new dataset, enhancing image realism without complex geometry or multi-view data.

Contribution

It provides a novel, efficient approach for precise camera control in text-to-image models, eliminating the need for pre-existing geometry or multi-view data.

Findings

01

Achieves precise camera control surpassing prompt engineering methods

02

Introduces a dataset with 57,000+ images and camera parameters

03

Demonstrates improved realism in generated images

Abstract

Images as an artistic medium often rely on specific camera angles and lens distortions to convey ideas or emotions; however, such precise control is missing in current text-to-image models. We propose an efficient and general solution that allows precise control over the camera when generating both photographic and artistic images. Unlike prior methods that rely on predefined shots, we rely solely on four simple extrinsic and intrinsic camera parameters, removing the need for pre-existing geometry, reference 3D objects, and multi-view data. We also present a novel dataset with more than 57,000 images, along with their text prompts and ground-truth camera parameters. Our evaluation shows precise camera control in text-to-image generation, surpassing traditional prompt engineering approaches. Our data, model, and code are publicly available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
edurnebb/PreciseCam
model· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Handwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques