Is Geometry Enough? An Evaluation of Landmark-Based Gaze Estimation
Daniele Agostinelli, Thomas Agostinelli, Andrea Generosi, Maura Mengoni

TL;DR
This paper evaluates the effectiveness of landmark-based geometric methods for gaze estimation, comparing them with deep learning models across multiple datasets, and finds that geometric features can be robust and efficient for practical applications.
Contribution
It provides a comprehensive benchmark of landmark-based gaze estimation, introduces a standardized pipeline, and demonstrates the potential of lightweight models for generalization.
Findings
Landmark-based models perform worse within domains due to noise.
MLP architectures generalize well across domains, comparable to ResNet18.
Sparse geometric features are sufficient for robust gaze estimation.
Abstract
Appearance-based gaze estimation frequently relies on deep Convolutional Neural Networks (CNNs). These models are accurate, but computationally expensive and act as "black boxes", offering little interpretability. Geometric methods based on facial landmarks are a lightweight alternative, but their performance limits and generalization capabilities remain underexplored in modern benchmarks. In this study, we conduct a comprehensive evaluation of landmark-based gaze estimation. We introduce a standardized pipeline to extract and normalize landmarks from three large-scale datasets (Gaze360, ETH-XGaze, and GazeGene) and train lightweight regression models, specifically Extreme Gradient Boosted trees and two neural architectures: a holistic Multi-Layer Perceptron (MLP) and a siamese MLP designed to capture binocular geometry. We find that landmark-based models exhibit lower performance in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Face recognition and analysis · Visual Attention and Saliency Detection
