From Perceived Effectiveness to Measured Impact: Identity-Aware Evaluation of Automated Counter-Stereotypes
Svetlana Kiritchenko, Anna Kerkhof, Isar Nejadgholi, Kathleen C. Fraser

TL;DR
This study evaluates the real-world impact of automated counter-stereotypes on gender bias across demographics, revealing complex, identity-sensitive effects and the divergence between perceived and actual effectiveness.
Contribution
It introduces an identity-aware evaluation framework for automated counter-stereotypes, highlighting nuanced demographic differences in bias mitigation effectiveness.
Findings
Older, male participants showed measurable implicit bias reduction.
Younger women exhibited increased bias after interventions.
Perceived effectiveness often did not match actual bias reduction results.
Abstract
We investigate the effect of automatically generated counter-stereotypes on gender bias held by users of various demographics on social media. Building on recent NLP advancements and social psychology literature, we evaluate two counter-stereotype strategies -- counter-facts and broadening universals (i.e., stating that anyone can have a trait regardless of group membership) -- which have been identified as the most potentially effective in previous studies. We assess the real-world impact of these strategies on mitigating gender bias across user demographics (gender and age), through the Implicit Association Test and the self-reported measures of explicit bias and perceived utility. Our findings reveal that actual effectiveness does not align with perceived effectiveness, and the former is a nuanced and sometimes divergent phenomenon across demographic groups. While overall bias…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
