Photorealistic and Identity-Preserving Image-Based Emotion Manipulation   with Latent Diffusion Models

Ioannis Pikoulis; Panagiotis P. Filntisis; Petros Maragos

arXiv:2308.03183·cs.CV·August 8, 2023

Photorealistic and Identity-Preserving Image-Based Emotion Manipulation with Latent Diffusion Models

Ioannis Pikoulis, Panagiotis P. Filntisis, Petros Maragos

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of latent diffusion models for realistic, identity-preserving emotion manipulation in images, demonstrating superior quality and realism over GAN-based methods through extensive evaluations.

Contribution

It introduces a novel approach combining latent diffusion models and CLIP-based text manipulation for emotion editing in in-the-wild images.

Findings

01

Outperforms GAN-based methods in image quality and realism

02

Achieves competitive emotion translation results

03

Provides publicly available code for reproducibility

Abstract

In this paper, we investigate the emotion manipulation capabilities of diffusion models with "in-the-wild" images, a rather unexplored application area relative to the vast and rapidly growing literature for image-to-image translation tasks. Our proposed method encapsulates several pieces of prior work, with the most important being Latent Diffusion models and text-driven manipulation with CLIP latents. We conduct extensive qualitative and quantitative evaluations on AffectNet, demonstrating the superiority of our approach in terms of image quality and realism, while achieving competitive results relative to emotion translation compared to a variety of GAN-based counterparts. Code is released as a publicly available repo.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

giannispikoulis/dsml-thesis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications

MethodsDiffusion · Contrastive Language-Image Pre-training