Ar2Can: An Architect and an Artist Leveraging a Canvas for Multi-Human Generation

Shubhankar Borse; Phuc Pham; Farzad Farhadzadeh; Seokeon Choi; Phong Ha Nguyen; Anh Tuan Tran; Sungrack Yun; Munawar Hayat; Fatih Porikli

arXiv:2511.22690·cs.CV·April 2, 2026

Ar2Can: An Architect and an Artist Leveraging a Canvas for Multi-Human Generation

Shubhankar Borse, Phuc Pham, Farzad Farhadzadeh, Seokeon Choi, Phong Ha Nguyen, Anh Tuan Tran, Sungrack Yun, Munawar Hayat, Fatih Porikli

PDF

1 Repo

TL;DR

Ar2Can is a two-stage framework that improves multi-human image generation by separately planning spatial layouts and rendering identities, achieving high accuracy and fidelity with synthetic data.

Contribution

The paper introduces a novel disentangled approach with a spatially-guided face matching reward, enhancing multi-human scene generation quality and identity preservation.

Findings

01

Significant improvements in count accuracy and identity preservation.

02

High perceptual quality of generated images.

03

Effective use of synthetic data without real multi-human images.

Abstract

Despite recent advances in personalized image generation, existing models consistently fail to produce reliable multi-human scenes, often merging or losing facial identity. We present Ar2Can, a novel two-stage framework that disentangles spatial planning from identity rendering for multi-human generation. The Architect predicts structured layouts, specifying where each person should appear. The Artist then synthesizes photorealistic images, guided by a spatially-grounded face matching reward that combines Hungarian spatial alignment with identity similarity. This approach ensures faces are rendered at correct locations and faithfully preserve reference identities. We develop two Architect variants, seamlessly integrated with our diffusion-based Artist model. This is optimized via Group Relative Policy Optimization (GRPO) using compositional rewards for count accuracy, image quality, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://qualcomm-ai-research.github.io/ar2can
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.