"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Preetam Prabhu Srikar Dammu, Hayoung Jung, Anjali Singh, Monojit, Choudhury, Tanushree Mitra

TL;DR
This paper introduces the CHAST metrics to detect covert harms in LLM conversations, revealing that most models generate subtle yet harmful content, especially regarding non-Western concepts like caste.
Contribution
It develops a new set of metrics grounded in social science to identify covert harms in LLMs, addressing cultural biases and subtle harm manifestations.
Findings
Most LLMs produce covert harms in conversations.
Harms are more extreme with non-Western concepts like caste.
Existing detection methods may overlook these subtle harms.
Abstract
Large language models (LLMs) have emerged as an integral part of modern societies, powering user-facing applications such as personal assistants and enterprise applications like recruitment tools. Despite their utility, research indicates that LLMs perpetuate systemic biases. Yet, prior works on LLM harms predominantly focus on Western concepts like race and gender, often overlooking cultural concepts from other parts of the world. Additionally, these studies typically investigate "harm" as a singular dimension, ignoring the various and subtle forms in which harms manifest. To address this gap, we introduce the Covert Harms and Social Threats (CHAST), a set of seven metrics grounded in social science literature. We utilize evaluation models aligned with human assessments to examine the presence of covert harms in LLM-generated conversations, particularly in the context of recruitment.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsSparse Evolutionary Training · Focus
