HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer   Encoders

Naina Dhingra

arXiv:2202.03548·cs.CV·February 9, 2022

HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders

Naina Dhingra

PDF

TL;DR

HeadPosr introduces an end-to-end trainable head pose estimation method using transformer encoders, demonstrating superior performance and establishing new benchmarks across multiple datasets.

Contribution

The paper presents a novel transformer-based architecture for head pose estimation, including extensive ablation studies and outperforming existing state-of-the-art methods.

Findings

01

Outperforms all compared methods on AFLW2000 and BIWI datasets.

02

Demonstrates the effectiveness of transformer encoders in HPE.

03

Sets new benchmarks for head pose estimation accuracy.

Abstract

In this paper, HeadPosr is proposed to predict the head poses using a single RGB image. \textit{HeadPosr} uses a novel architecture which includes a transformer encoder. In concrete, it consists of: (1) backbone; (2) connector; (3) transformer encoder; (4) prediction head. The significance of using a transformer encoder for HPE is studied. An extensive ablation study is performed on varying the (1) number of encoders; (2) number of heads; (3) different position embeddings; (4) different activations; (5) input channel size, in a transformer used in HeadPosr. Further studies on using: (1) different backbones, (2) using different learning rates are also shown. The elaborated experiments and ablations studies are conducted using three different open-source widely used datasets for HPE, i.e., 300W-LP, AFLW2000, and BIWI datasets. Experiments illustrate that \textit{HeadPosr} outperforms all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.