VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation
Rakib Hossain Sajib, Md Kishor Morol, Rajan Das Gupta, Mohammad Sakib Mahmood, Shuvra Smaran Das

TL;DR
This paper evaluates large vision-language models for zero-shot human age estimation from facial images, demonstrating their competitive performance and highlighting challenges in fairness and interpretability.
Contribution
It introduces a comprehensive zero-shot benchmark for LVLMs on age estimation, comparing multiple models and analyzing performance disparities without fine-tuning.
Findings
LVLMs achieve competitive zero-shot age estimation performance.
Performance varies across demographic groups and image quality.
The benchmark reveals challenges in fairness, interpretability, and computational cost.
Abstract
Human age estimation from facial images represents a challenging computer vision task with significant applications in biometrics, healthcare, and human-computer interaction. While traditional deep learning approaches require extensive labeled datasets and domain-specific training, recent advances in large vision-language models (LVLMs) offer the potential for zero-shot age estimation. This study presents a comprehensive zero-shot evaluation of state-of-the-art Large Vision-Language Models (LVLMs) for facial age estimation, a task traditionally dominated by domain-specific convolutional networks and supervised learning. We assess the performance of GPT-4o, Claude 3.5 Sonnet, and LLaMA 3.2 Vision on two benchmark datasets, UTKFace and FG-NET, without any fine-tuning or task-specific adaptation. Using eight evaluation metrics, including MAE, MSE, RMSE, MAPE, MBE, , CCC, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
