Understanding the Representation and Representativeness of Age in AI Data Sets
Joon Sung Park, Michael S. Bernstein, Robin N. Brewer, Ece Kamar,, Meredith Ringel Morris

TL;DR
This paper investigates how age, especially older adults, is represented in AI face data sets, revealing significant under-representation and inconsistent documentation, which impacts fairness and inclusivity in AI models.
Contribution
It provides a systematic analysis of age representation in 92 publicly available face data sets, highlighting gaps and inconsistencies in age documentation and representation.
Findings
Older adults are significantly under-represented in face data sets.
Only 24 data sets include age-related metadata.
Inconsistent methods are used to record age information.
Abstract
A diverse representation of different demographic groups in AI training data sets is important in ensuring that the models will work for a large range of users. To this end, recent efforts in AI fairness and inclusion have advocated for creating AI data sets that are well-balanced across race, gender, socioeconomic status, and disability status. In this paper, we contribute to this line of work by focusing on the representation of age by asking whether older adults are represented proportionally to the population at large in AI data sets. We examine publicly-available information about 92 face data sets to understand how they codify age as a case study to investigate how the subjects' ages are recorded and whether older generations are represented. We find that older adults are very under-represented; five data sets in the study that explicitly documented the closed age intervals of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
