Merging Multiple Datasets for Improved Appearance-Based Gaze Estimation
Liang Wu, Bertram E. Shi

TL;DR
This paper introduces a novel transformer-based architecture and a gaze adaptation module to effectively combine multiple datasets for appearance-based gaze estimation, overcoming protocol and label inconsistencies to improve accuracy.
Contribution
It proposes a two-stage transformer fusion method and a dataset-specific gaze adaptation module to enhance multi-dataset gaze estimation performance.
Findings
Improved gaze estimation accuracy by 10-20% over state-of-the-art methods.
Effective handling of dataset protocol and label inconsistencies.
Demonstrated benefits of the proposed architecture through extensive experiments.
Abstract
Multiple datasets have been created for training and testing appearance-based gaze estimators. Intuitively, more data should lead to better performance. However, combining datasets to train a single esti-mator rarely improves gaze estimation performance. One reason may be differences in the experimental protocols used to obtain the gaze sam-ples, resulting in differences in the distributions of head poses, gaze an-gles, illumination, etc. Another reason may be the inconsistency between methods used to define gaze angles (label mismatch). We propose two innovations to improve the performance of gaze estimation by leveraging multiple datasets, a change in the estimator architecture and the intro-duction of a gaze adaptation module. Most state-of-the-art estimators merge information extracted from images of the two eyes and the entire face either in parallel or combine information from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
