Multi-Domain Multi-Definition Landmark Localization for Small Datasets
David Ferman, Gaurav Bharaj

TL;DR
This paper introduces a multi-domain, multi-definition facial landmark localization method using a Vision Transformer with a novel shared prior, enabling effective training on small datasets and achieving state-of-the-art results across diverse domains.
Contribution
The paper proposes a definition-agnostic landmark prior and a multi-domain learning framework with a Vision Transformer, improving small dataset localization performance across various domains.
Findings
State-of-the-art results on COFW and WFLW datasets.
Effective localization on small datasets of animals, caricatures, and paintings.
Validated with ablation studies and a new pareidolia dataset.
Abstract
We present a novel method for multi image domain and multi-landmark definition learning for small dataset facial localization. Training a small dataset alongside a large(r) dataset helps with robust learning for the former, and provides a universal mechanism for facial landmark localization for new and/or smaller standard datasets. To this end, we propose a Vision Transformer encoder with a novel decoder with a definition agnostic shared landmark semantic group structured prior, that is learnt, as we train on more than one dataset concurrently. Due to our novel definition agnostic group prior the datasets may vary in landmark definitions and domains. During the decoder stage we use cross- and self-attention, whose output is later fed into domain/definition specific heads that minimize a Laplacian-log-likelihood loss. We achieve state-of-the-art performance on standard landmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Cleft Lip and Palate Research · Facial Nerve Paralysis Treatment and Research
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Dropout · Layer Normalization
