Proportion and Perspective Control for Flow-Based Image Generation

Julien Boudier; Hugo Caselles-Dupr\'e

arXiv:2510.21763·cs.CV·October 28, 2025

Proportion and Perspective Control for Flow-Based Image Generation

Julien Boudier, Hugo Caselles-Dupr\'e

PDF

TL;DR

This paper introduces two specialized ControlNets for flow-based image generation that enable artistic control over spatial and geometric aspects, enhancing the flexibility of diffusion models.

Contribution

The paper presents two novel ControlNets for controlling object proportions and perspective in image synthesis, supported by new training pipelines and data annotation methods.

Findings

01

Effective control over object position and scale

02

Successful manipulation of scene perspective

03

Limitations with complex constraints

Abstract

While modern text-to-image diffusion models generate high-fidelity images, they offer limited control over the spatial and geometric structure of the output. To address this, we introduce and evaluate two ControlNets specialized for artistic control: (1) a proportion ControlNet that uses bounding boxes to dictate the position and scale of objects, and (2) a perspective ControlNet that employs vanishing lines to control the 3D geometry of the scene. We support the training of these modules with data pipelines that leverage vision-language models for annotation and specialized algorithms for conditioning image synthesis. Our experiments demonstrate that both modules provide effective control but exhibit limitations with complex constraints. Both models are released on HuggingFace: https://huggingface.co/obvious-research

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.