Unsupervised Deep Representations for Learning Audience Facial Behaviors

Suman Saha; Rajitha Navarathna; Leonhard Helminger; Romann Weber

arXiv:1805.04136·cs.CV·May 14, 2018·1 cites

Unsupervised Deep Representations for Learning Audience Facial Behaviors

Suman Saha, Rajitha Navarathna, Leonhard Helminger, Romann Weber

PDF

Open Access

TL;DR

This paper introduces an unsupervised deep learning method combining VAE and GAN to analyze audience facial behaviors, effectively capturing engagement and disengagement signals from video footage without labeled data.

Contribution

It presents a novel unsupervised approach that jointly trains VAE and GAN to learn meaningful facial behavior representations from unlabeled audience footage.

Findings

01

Successfully encodes audience engagement signals like smiling and laughing.

02

Effectively detects disengagement cues such as yawning.

03

Provides a proof of concept for annotating complex multimedia data without labels.

Abstract

In this paper, we present an unsupervised learning approach for analyzing facial behavior based on a deep generative model combined with a convolutional neural network (CNN). We jointly train a variational auto-encoder (VAE) and a generative adversarial network (GAN) to learn a powerful latent representation from footage of audiences viewing feature-length movies. We show that the learned latent representation successfully encodes meaningful signatures of behaviors related to audience engagement (smiling & laughing) and disengagement (yawning). Our results provide a proof of concept for a more general methodology for annotating hard-to-label multimedia data featuring sparse examples of signals of interest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing