T-Person-GAN: Text-to-Person Image Generation with Identity-Consistency and Manifold Mix-Up
Deyin Liu, Lin Yuanbo Wu, Bo Li, Zongyuan Ge

TL;DR
This paper introduces T-Person-GAN, a novel model for generating high-resolution person images from text, ensuring identity consistency and robustness against variations, advancing the state-of-the-art in text-to-person image synthesis.
Contribution
The paper proposes two innovative mechanisms, identity-preserving regularization and manifold mix-up, to improve identity consistency and discriminability in text-to-person image generation.
Findings
Significant improvement over existing models in generating person images from text.
Effective preservation of identity features across generated images.
Robustness against inter-person variations demonstrated.
Abstract
In this paper, we present an end-to-end approach to generate high-resolution person images conditioned on texts only. State-of-the-art text-to-image generation models are mainly designed for center-object generation, e.g., flowers and birds. Unlike center-placed objects with similar shapes and orientation, person image generation is a more challenging task, for which we observe the followings: 1) the generated images for the same person exhibit visual details with identity-consistency, e.g., identity-related textures/clothes/shoes across the images, and 2) those images should be discriminant for being robust against the inter-person variations caused by visual ambiguities. To address the above challenges, we develop an effective generative model to produce person images with two novel mechanisms. In particular, our first mechanism (called T-Person-GAN-ID) is to integrate the one-stream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Pose and Action Recognition
