Stellar: Systematic Evaluation of Human-Centric Personalized   Text-to-Image Methods

Panos Achlioptas; Alexandros Benetatos; Iordanis Fostiropoulos,; Dimitris Skourtis

arXiv:2312.06116·cs.CV·December 12, 2023·1 cites

Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods

Panos Achlioptas, Alexandros Benetatos, Iordanis Fostiropoulos,, Dimitris Skourtis

PDF

Open Access

TL;DR

This paper introduces Stellar, a comprehensive dataset and evaluation framework for personalized text-to-image generation, along with a new baseline that outperforms existing methods without test-time fine-tuning.

Contribution

The work provides a large, high-quality dataset (Stellar), new metrics aligned with human judgment, and a simple baseline that achieves state-of-the-art results in personalized text-to-image synthesis.

Findings

01

Stellar dataset is an order of magnitude larger than existing datasets.

02

New metrics better correlate with human judgment.

03

Proposed baseline outperforms previous methods in quality and efficiency.

Abstract

In this work, we systematically study the problem of personalized text-to-image generation, where the output image is expected to portray information about specific human subjects. E.g., generating images of oneself appearing at imaginative places, interacting with various items, or engaging in fictional activities. To this end, we focus on text-to-image systems that input a single image of an individual to ground the generation process along with text describing the desired visual context. Our first contribution is to fill the literature gap by curating high-quality, appropriate data for this task. Namely, we introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available. Having established Stellar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Generative Adversarial Networks and Image Synthesis

MethodsFocus