Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?
Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, and, Kai-Wei Chang

TL;DR
This paper explores whether language models can understand and respond to natural language ethical interventions to modify their behavior, revealing current limitations and proposing a new challenge for AI ethics and understanding.
Contribution
Introduces the Linguistic Ethical Interventions (LEI) task to evaluate models' ability to comprehend and act on ethical instructions in natural language.
Findings
Current models respond poorly to ethical interventions
Few-shot learning improves responses but remains insufficient
LEI presents a new challenge for language understanding and AI ethics
Abstract
Is it possible to use natural language to intervene in a model's behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior by communicating context-specific principles of ethics and equity to it. To this end, we build upon recent methods for quantifying a system's social stereotypes, augmenting them with different kinds of ethical interventions and the desired model behavior under such interventions. Our zero-shot evaluation finds that even today's powerful neural language models are extremely poor ethical-advice takers, that is, they respond surprisingly little to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
