SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context

Aishwarya Verma; Laud Ammah; Olivia Nercy Ndlovu Lucas; Andrew Zaldivar; Vinodkumar Prabhakaran; Sunipa Dev

arXiv:2602.22404·cs.CL·February 27, 2026

SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context

Aishwarya Verma, Laud Ammah, Olivia Nercy Ndlovu Lucas, Andrew Zaldivar, Vinodkumar Prabhakaran, Sunipa Dev

PDF

Open Access 1 Video

TL;DR

This paper presents SAFARI, a multilingual stereotype dataset for sub-Saharan Africa, developed through community-engaged methods to improve AI safety and representation in underrepresented regions.

Contribution

It introduces a novel, culturally-sensitive methodology for collecting stereotype data and provides a comprehensive multilingual dataset covering four underrepresented African countries.

Findings

01

Created a dataset of over 6,700 stereotypes in multiple languages.

02

Developed a reproducible, community-engaged data collection methodology.

03

Enhanced representation of African stereotypes in NLP resources.

Abstract

Stereotype repositories are critical to assess generative AI model safety, but currently lack adequate global coverage. It is imperative to prioritize targeted expansion, strategically addressing existing deficits, over merely increasing data volume. This work introduces a multilingual stereotype resource covering four sub-Saharan African countries that are severely underrepresented in NLP resources: Ghana, Kenya, Nigeria, and South Africa. By utilizing socioculturally-situated, community-engaged methods, including telephonic surveys moderated in native languages, we establish a reproducible methodology that is sensitive to the region's complex linguistic diversity and traditional orality. By deliberately balancing the sample across diverse ethnic and demographic backgrounds, we ensure broad coverage, resulting in a dataset of 3,534 stereotypes in English and 3,206 stereotypes across 15…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context· underline

Taxonomy

TopicsEthics and Social Impacts of AI · Computational and Text Analysis Methods · Hate Speech and Cyberbullying Detection