When Identities Collapse: A Stress-Test Benchmark for Multi-Subject Personalization
Zhihan Chen, Yuhuan Zhao, Yijie Zhu, Xinyu Yao

TL;DR
This paper introduces a stress-test benchmark and a new metric to evaluate and expose the failure modes of multi-subject personalization in text-to-image diffusion models, revealing significant identity collapse at higher complexities.
Contribution
The paper presents a novel benchmark and the Subject Collapse Rate metric to systematically evaluate multi-subject identity preservation, exposing limitations of current models and metrics.
Findings
Models struggle with identity preservation as subject count increases beyond 4.
Existing CLIP metrics are unreliable for multi-subject evaluation.
SCR metric effectively detects local identity collapse, approaching 100% at 10 subjects.
Abstract
Subject-driven text-to-image diffusion models have achieved remarkable success in preserving single identities, yet their ability to compose multiple interacting subjects remains largely unexplored and highly challenging. Existing evaluation protocols typically rely on global CLIP metrics, which are insensitive to local identity collapse and fail to capture the severity of multi-subject entanglement. In this paper, we identify a pervasive "Illusion of Scalability" in current models: while they excel at synthesizing 2-4 subjects in simple layouts, they suffer from catastrophic identity collapse when scaled to 6-10 subjects or tasked with complex physical interactions. To systematically expose this failure mode, we construct a rigorous stress-test benchmark comprising 75 prompts distributed across varying subject counts and interaction difficulties (Neutral, Occlusion, Interaction).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
