TextGaze: Gaze-Controllable Face Generation with Natural Language
Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang

TL;DR
This paper introduces TextGaze, a novel method for generating face images controllable by natural language descriptions of gaze and head pose, using a new dataset and diffusion models.
Contribution
It presents a new text-based gaze-controllable face generation task, a large dataset of gaze descriptions, and a diffusion-based method combining sketch and model-based modules.
Findings
Effective face generation from text descriptions of gaze.
Successful application on FFHQ dataset.
Availability of dataset and code for future research.
Abstract
Generating face image with specific gaze information has attracted considerable attention. Existing approaches typically input gaze values directly for face generation, which is unnatural and requires annotated gaze datasets for training, thereby limiting its application. In this paper, we present a novel gaze-controllable face generation task. Our approach inputs textual descriptions that describe human gaze and head behavior and generates corresponding face images. Our work first introduces a text-of-gaze dataset containing over 90k text descriptions spanning a dense distribution of gaze and head poses. We further propose a gaze-controllable text-to-face method. Our method contains a sketch-conditioned face diffusion module and a model-based sketch diffusion module. We define a face sketch based on facial landmarks and eye segmentation map. The face diffusion module generates face…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Gaze Tracking and Assistive Technology · Hand Gesture Recognition Systems
MethodsDiffusion
