Ethical-Advice Taker: Do Language Models Understand Natural Language   Interventions?

Jieyu Zhao; Daniel Khashabi; Tushar Khot; Ashish Sabharwal; and; Kai-Wei Chang

arXiv:2106.01465·cs.CL·June 4, 2021

Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, and, Kai-Wei Chang

PDF

Open Access 1 Repo

TL;DR

This paper explores whether language models can understand and respond to natural language ethical interventions to modify their behavior, revealing current limitations and proposing a new challenge for AI ethics and understanding.

Contribution

Introduces the Linguistic Ethical Interventions (LEI) task to evaluate models' ability to comprehend and act on ethical instructions in natural language.

Findings

01

Current models respond poorly to ethical interventions

02

Few-shot learning improves responses but remains insufficient

03

LEI presents a new challenge for language understanding and AI ethics

Abstract

Is it possible to use natural language to intervene in a model's behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior by communicating context-specific principles of ethics and equity to it. To this end, we build upon recent methods for quantifying a system's social stereotypes, augmenting them with different kinds of ethical interventions and the desired model behavior under such interventions. Our zero-shot evaluation finds that even today's powerful neural language models are extremely poor ethical-advice takers, that is, they respond surprisingly little to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/ethical-interventions
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications