MiVOLO: Multi-input Transformer for Age and Gender Estimation
Maksim Kuprashevich, Irina Tolstykh

TL;DR
MiVOLO is a multi-input transformer model that improves age and gender estimation accuracy, especially in challenging conditions, by integrating facial and person image data and achieving state-of-the-art results.
Contribution
The paper introduces MiVOLO, a novel multi-input transformer model that combines face and person data for robust age and gender recognition, including a new benchmark dataset.
Findings
Achieves state-of-the-art performance on four benchmarks.
Outperforms human accuracy in age recognition.
Operates in real-time with high accuracy.
Abstract
Age and gender recognition in the wild is a highly challenging task: apart from the variability of conditions, pose complexities, and varying image quality, there are cases where the face is partially or completely occluded. We present MiVOLO (Multi Input VOLO), a straightforward approach for age and gender estimation using the latest vision transformer. Our method integrates both tasks into a unified dual input/output model, leveraging not only facial information but also person image data. This improves the generalization ability of our model and enables it to deliver satisfactory results even when the face is not visible in the image. To evaluate our proposed model, we conduct experiments on four popular benchmarks and achieve state-of-the-art performance, while demonstrating real-time processing capabilities. Additionally, we introduce a novel benchmark based on images from the Open…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Adam · Dense Connections · Softmax · Multi-Head Attention
