VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation

Rakib Hossain Sajib; Md Kishor Morol; Rajan Das Gupta; Mohammad Sakib Mahmood; Shuvra Smaran Das

arXiv:2603.26015·cs.CV·March 30, 2026

VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation

Rakib Hossain Sajib, Md Kishor Morol, Rajan Das Gupta, Mohammad Sakib Mahmood, Shuvra Smaran Das

PDF

TL;DR

This paper evaluates large vision-language models for zero-shot human age estimation from facial images, demonstrating their competitive performance and highlighting challenges in fairness and interpretability.

Contribution

It introduces a comprehensive zero-shot benchmark for LVLMs on age estimation, comparing multiple models and analyzing performance disparities without fine-tuning.

Findings

01

LVLMs achieve competitive zero-shot age estimation performance.

02

Performance varies across demographic groups and image quality.

03

The benchmark reveals challenges in fairness, interpretability, and computational cost.

Abstract

Human age estimation from facial images represents a challenging computer vision task with significant applications in biometrics, healthcare, and human-computer interaction. While traditional deep learning approaches require extensive labeled datasets and domain-specific training, recent advances in large vision-language models (LVLMs) offer the potential for zero-shot age estimation. This study presents a comprehensive zero-shot evaluation of state-of-the-art Large Vision-Language Models (LVLMs) for facial age estimation, a task traditionally dominated by domain-specific convolutional networks and supervised learning. We assess the performance of GPT-4o, Claude 3.5 Sonnet, and LLaMA 3.2 Vision on two benchmark datasets, UTKFace and FG-NET, without any fine-tuning or task-specific adaptation. Using eight evaluation metrics, including MAE, MSE, RMSE, MAPE, MBE, $R^{2}$ , CCC, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.