LACONIC: A 3D Layout Adapter for Controllable Image Creation

L\'eopold Maillard; Tom Durand; Adrien Ramanana Rahary; Maks Ovsjanikov

arXiv:2507.03257·cs.CV·August 5, 2025

LACONIC: A 3D Layout Adapter for Controllable Image Creation

L\'eopold Maillard, Tom Durand, Adrien Ramanana Rahary, Maks Ovsjanikov

PDF

Open Access

TL;DR

LACONIC introduces a 3D-aware adapter for pretrained diffusion models, enabling controllable, scene-consistent image synthesis with camera and geometry controls, while supporting scene editing and generalization.

Contribution

We propose a novel 3D-aware conditioning method and adapter network that enhances pretrained text-to-image models with scene and object-level 3D control.

Findings

01

Supports camera and explicit 3D geometry conditioning

02

Enables scene editing like object positioning and resizing

03

Shows strong generalization with lightweight design

Abstract

Existing generative approaches for guided image synthesis of multi-object scenes typically rely on 2D controls in the image or text space. As a result, these methods struggle to maintain and respect consistent three-dimensional geometric structure, underlying the scene. In this paper, we propose a novel conditioning approach, training method and adapter network that can be plugged into pretrained text-to-image diffusion models. Our approach provides a way to endow such models with 3D-awareness, while leveraging their rich prior knowledge. Our method supports camera control, conditioning on explicit 3D geometries and, for the first time, accounts for the entire context of a scene, i.e., both on and off-screen items, to synthesize plausible and semantically rich images. Despite its multi-modal nature, our model is lightweight, requires a reasonable number of data for supervised learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Advanced Image and Video Retrieval Techniques