"They are uncultured": Unveiling Covert Harms and Social Threats in LLM   Generated Conversations

Preetam Prabhu Srikar Dammu; Hayoung Jung; Anjali Singh; Monojit; Choudhury; Tanushree Mitra

arXiv:2405.05378·cs.CL·May 10, 2024·1 cites

"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations

Preetam Prabhu Srikar Dammu, Hayoung Jung, Anjali Singh, Monojit, Choudhury, Tanushree Mitra

PDF

Open Access 1 Repo 9 Models 1 Video

TL;DR

This paper introduces the CHAST metrics to detect covert harms in LLM conversations, revealing that most models generate subtle yet harmful content, especially regarding non-Western concepts like caste.

Contribution

It develops a new set of metrics grounded in social science to identify covert harms in LLMs, addressing cultural biases and subtle harm manifestations.

Findings

01

Most LLMs produce covert harms in conversations.

02

Harms are more extreme with non-Western concepts like caste.

03

Existing detection methods may overlook these subtle harms.

Abstract

Large language models (LLMs) have emerged as an integral part of modern societies, powering user-facing applications such as personal assistants and enterprise applications like recruitment tools. Despite their utility, research indicates that LLMs perpetuate systemic biases. Yet, prior works on LLM harms predominantly focus on Western concepts like race and gender, often overlooking cultural concepts from other parts of the world. Additionally, these studies typically investigate "harm" as a singular dimension, ignoring the various and subtle forms in which harms manifest. To address this gap, we introduce the Covert Harms and Social Threats (CHAST), a set of seven metrics grounded in social science literature. We utilize evaluation models aligned with human assessments to examine the presence of covert harms in LLM-generated conversations, particularly in the context of recruitment.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hiyouga/llama-factory
pytorchOfficial

Models

Videos

“They are uncultured”: Unveiling Covert Harms and Social Threats in LLM Generated Conversations· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsSparse Evolutionary Training · Focus