How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine   Studies

Alina Leidinger; Richard Rogers

arXiv:2407.11733·cs.CL·August 2, 2024·1 cites

How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies

Alina Leidinger, Richard Rogers

PDF

Open Access

TL;DR

This paper evaluates how large language models (LLMs) mitigate stereotyping harms, revealing improvements with safety prompts but persistent biases, especially regarding certain social groups, and discusses implications for policy and model development.

Contribution

It introduces a novel autocompletion prompt-based evaluation for stereotyping in LLMs and provides empirical insights into their effectiveness and limitations in mitigating social harms.

Findings

01

Safety prompts reduce stereotyping but do not eliminate it.

02

LLMs show biases towards certain ethnicities and sexual orientations.

03

Intersectional identities trigger more stereotyping.

Abstract

With the widespread availability of LLMs since the release of ChatGPT and increased public scrutiny, commercial model development appears to have focused their efforts on 'safety' training concerning legal liabilities at the expense of social impact evaluation. This mimics a similar trend which we could observe for search engine autocompletion some years prior. We draw on scholarship from NLP and search engine auditing and present a novel evaluation task in the style of autocompletion prompts to assess stereotyping in LLMs. We assess LLMs by using four metrics, namely refusal rates, toxicity, sentiment and regard, with and without safety system prompts. Our findings indicate an improvement to stereotyping outputs with the system prompt, but overall a lack of attention by LLMs under study to certain harms classified as toxic, particularly for prompts about peoples/ethnicities and sexual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFinTech, Crowdfunding, Digital Finance · Cybercrime and Law Enforcement Studies

MethodsSoftmax · Attention Is All You Need