Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language Edits
Zelong Sun, Jiahui Wu, Ying Ba, Dong Jing, Zhiwu Lu

TL;DR
This paper introduces a new task called Portrait Collection Generation (PCG) that creates coherent portrait collections from natural language edits, addressing complex multi-attribute modifications and detail preservation, and proposes a large dataset and a novel framework for this purpose.
Contribution
The paper presents the first large-scale PCG dataset and a novel framework, SCheese, for high-fidelity, detail-preserving portrait collection generation from natural language instructions.
Findings
CHEESE dataset contains 24K collections and 573K samples.
SCheese achieves state-of-the-art performance on PCG tasks.
Framework effectively preserves identity and details during generation.
Abstract
As social media platforms proliferate, users increasingly demand intuitive ways to create diverse, high-quality portrait collections. In this work, we introduce Portrait Collection Generation (PCG), a novel task that generates coherent portrait collections by editing a reference portrait image through natural language instructions. This task poses two unique challenges to existing methods: (1) complex multi-attribute modifications such as pose, spatial layout, and camera viewpoint; and (2) high-fidelity detail preservation including identity, clothing, and accessories. To address these challenges, we propose CHEESE, the first large-scale PCG dataset containing 24K portrait collections and 573K samples with high-quality modification text annotations, constructed through an Large Vison-Language Model-based pipeline with inversion-based verification. We further propose SCheese, a framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis
