Ethical and social risks of harm from Language Models

Laura Weidinger; John Mellor; Maribeth Rauh; Conor Griffin; Jonathan; Uesato; Po-Sen Huang; Myra Cheng; Mia Glaese; Borja Balle; Atoosa Kasirzadeh,; Zac Kenton; Sasha Brown; Will Hawkins; Tom Stepleton; Courtney Biles; Abeba; Birhane; Julia Haas; Laura Rimell; Lisa Anne Hendricks; William Isaac; Sean; Legassick; Geoffrey Irving; Iason Gabriel

arXiv:2112.04359·cs.CL·December 9, 2021·71 cites

Ethical and social risks of harm from Language Models

Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan, Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh,, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba, Birhane, Julia Haas, Laura Rimell

PDF

Open Access 1 Datasets

TL;DR

This paper systematically analyzes the diverse ethical and social risks associated with large-scale Language Models, emphasizing the importance of understanding, mitigating, and responsibly managing these risks across multiple domains.

Contribution

It provides a comprehensive, multidisciplinary framework for identifying and analyzing 21 specific risks of LMs, along with mitigation strategies and organizational responsibilities.

Findings

01

Identified six key risk areas including discrimination, misinformation, and environmental harms.

02

Detailed analysis of 21 specific risks with potential mitigation approaches.

03

Highlighted the importance of collaboration and further research in risk assessment.

Abstract

This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

6rightjade/expguardmix
dataset· 217 dl
217 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection