Whose Preferences? Differences in Fairness Preferences and Their Impact   on the Fairness of AI Utilizing Human Feedback

Emilia Agis Lerner; Florian E. Dorner; Elliott Ash; Naman Goel

arXiv:2406.05902·cs.LG·June 11, 2024

Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback

Emilia Agis Lerner, Florian E. Dorner, Elliott Ash, Naman Goel

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how demographic differences among annotators influence fairness preferences in content moderation and demonstrates that ensemble classifiers trained on diverse demographic annotations improve fairness outcomes.

Contribution

It introduces a novel dataset on fairness preferences across demographics and shows that ensemble classifiers leveraging diverse annotations enhance fairness in AI moderation.

Findings

01

Significant demographic gaps in fairness preferences.

02

Demographics influence perceptions of individual fairness.

03

Ensemble classifiers outperform single classifiers across demographic groups.

Abstract

There is a growing body of work on learning from human feedback to align various aspects of machine learning systems with human values and preferences. We consider the setting of fairness in content moderation, in which human feedback is used to determine how two comments -- referencing different sensitive attribute groups -- should be treated in comparison to one another. With a novel dataset collected from Prolific and MTurk, we find significant gaps in fairness preferences depending on the race, age, political stance, educational level, and LGBTQ+ identity of annotators. We also demonstrate that demographics mentioned in text have a strong influence on how users perceive individual fairness in moderation. Further, we find that differences also exist in downstream classifiers trained to predict human preferences. Finally, we observe that an ensemble, giving equal weight to classifiers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emiliaagis/differences-in-fairness-preferences-acl-2024
noneOfficial

Videos

Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback· underline

Taxonomy

TopicsEthics and Social Impacts of AI

MethodsALIGN