Self-supervised learning of a facial attribute embedding from video

Olivia Wiles; A. Sophia Koepke; Andrew Zisserman

arXiv:1808.06882·cs.CV·August 22, 2018·99 cites

Self-supervised learning of a facial attribute embedding from video

Olivia Wiles, A. Sophia Koepke, Andrew Zisserman

PDF

Open Access 2 Repos

TL;DR

This paper introduces FAb-Net, a self-supervised framework that learns facial attribute embeddings from videos without labeled data, capturing head pose, landmarks, and expressions, and outperforming many existing methods.

Contribution

The paper presents a novel self-supervised learning approach using video data to embed facial attributes, incorporating attention masks and curriculum learning for improved performance.

Findings

01

The network effectively encodes head pose, facial landmarks, and expressions.

02

It outperforms or matches state-of-the-art self-supervised methods.

03

It approaches supervised method performance without labeled data.

Abstract

We propose a self-supervised framework for learning facial attributes by simply watching videos of a human face speaking, laughing, and moving over time. To perform this task, we introduce a network, Facial Attributes-Net (FAb-Net), that is trained to embed multiple frames from the same video face-track into a common low-dimensional space. With this approach, we make three contributions: first, we show that the network can leverage information from multiple source frames by predicting confidence/attention masks for each frame; second, we demonstrate that using a curriculum learning regime improves the learned embedding; finally, we demonstrate that the network learns a meaningful face embedding that encodes information about head pose, facial landmarks and facial expression, i.e. facial attributes, without having been supervised with any labelled data. We are comparable or superior to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis