Multi-Domain Multi-Definition Landmark Localization for Small Datasets

David Ferman; Gaurav Bharaj

arXiv:2203.10358·cs.CV·October 17, 2022

Multi-Domain Multi-Definition Landmark Localization for Small Datasets

David Ferman, Gaurav Bharaj

PDF

Open Access

TL;DR

This paper introduces a multi-domain, multi-definition facial landmark localization method using a Vision Transformer with a novel shared prior, enabling effective training on small datasets and achieving state-of-the-art results across diverse domains.

Contribution

The paper proposes a definition-agnostic landmark prior and a multi-domain learning framework with a Vision Transformer, improving small dataset localization performance across various domains.

Findings

01

State-of-the-art results on COFW and WFLW datasets.

02

Effective localization on small datasets of animals, caricatures, and paintings.

03

Validated with ablation studies and a new pareidolia dataset.

Abstract

We present a novel method for multi image domain and multi-landmark definition learning for small dataset facial localization. Training a small dataset alongside a large(r) dataset helps with robust learning for the former, and provides a universal mechanism for facial landmark localization for new and/or smaller standard datasets. To this end, we propose a Vision Transformer encoder with a novel decoder with a definition agnostic shared landmark semantic group structured prior, that is learnt, as we train on more than one dataset concurrently. Due to our novel definition agnostic group prior the datasets may vary in landmark definitions and domains. During the decoder stage we use cross- and self-attention, whose output is later fed into domain/definition specific heads that minimize a Laplacian-log-likelihood loss. We achieve state-of-the-art performance on standard landmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Cleft Lip and Palate Research · Facial Nerve Paralysis Treatment and Research

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Dropout · Layer Normalization