Proportion and Perspective Control for Flow-Based Image Generation
Julien Boudier, Hugo Caselles-Dupr\'e

TL;DR
This paper introduces two specialized ControlNets for flow-based image generation that enable artistic control over spatial and geometric aspects, enhancing the flexibility of diffusion models.
Contribution
The paper presents two novel ControlNets for controlling object proportions and perspective in image synthesis, supported by new training pipelines and data annotation methods.
Findings
Effective control over object position and scale
Successful manipulation of scene perspective
Limitations with complex constraints
Abstract
While modern text-to-image diffusion models generate high-fidelity images, they offer limited control over the spatial and geometric structure of the output. To address this, we introduce and evaluate two ControlNets specialized for artistic control: (1) a proportion ControlNet that uses bounding boxes to dictate the position and scale of objects, and (2) a perspective ControlNet that employs vanishing lines to control the 3D geometry of the scene. We support the training of these modules with data pipelines that leverage vision-language models for annotation and specialized algorithms for conditioning image synthesis. Our experiments demonstrate that both modules provide effective control but exhibit limitations with complex constraints. Both models are released on HuggingFace: https://huggingface.co/obvious-research
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
