Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language Edits

Zelong Sun; Jiahui Wu; Ying Ba; Dong Jing; Zhiwu Lu

arXiv:2601.20511·cs.CV·January 29, 2026

Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language Edits

Zelong Sun, Jiahui Wu, Ying Ba, Dong Jing, Zhiwu Lu

PDF

Open Access

TL;DR

This paper introduces a new task called Portrait Collection Generation (PCG) that creates coherent portrait collections from natural language edits, addressing complex multi-attribute modifications and detail preservation, and proposes a large dataset and a novel framework for this purpose.

Contribution

The paper presents the first large-scale PCG dataset and a novel framework, SCheese, for high-fidelity, detail-preserving portrait collection generation from natural language instructions.

Findings

01

CHEESE dataset contains 24K collections and 573K samples.

02

SCheese achieves state-of-the-art performance on PCG tasks.

03

Framework effectively preserves identity and details during generation.

Abstract

As social media platforms proliferate, users increasingly demand intuitive ways to create diverse, high-quality portrait collections. In this work, we introduce Portrait Collection Generation (PCG), a novel task that generates coherent portrait collections by editing a reference portrait image through natural language instructions. This task poses two unique challenges to existing methods: (1) complex multi-attribute modifications such as pose, spatial layout, and camera viewpoint; and (2) high-fidelity detail preservation including identity, clothing, and accessories. To address these challenges, we propose CHEESE, the first large-scale PCG dataset containing 24K portrait collections and 573K samples with high-quality modification text annotations, constructed through an Large Vison-Language Model-based pipeline with inversion-based verification. We further propose SCheese, a framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis