What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon,, Nouha Dziri, Faeze Brahman, Yejin Choi

TL;DR
This paper introduces a novel iterative self-distillation method to generate high-quality, diverse contextual explanations for moral judgments, enhancing the understanding of nuanced human moral reasoning.
Contribution
It presents a new approach combining self-distillation, filtering, and imitation learning to improve the validity and diversity of moral reasoning contexts and rationales.
Findings
Produced a dataset of 1.2 million contextualized moral judgments
Achieved high human agreement rates of 85.9% to 99.8%
Final model outperforms intermediate models significantly
Abstract
Moral or ethical judgments rely heavily on the specific contexts in which they occur. Understanding varying shades of defeasible contextualizations (i.e., additional information that strengthens or attenuates the moral acceptability of an action) is critical to accurately represent the subtlety and intricacy of grounded human moral judgment in real-life scenarios. We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable, along with commonsense rationales that justify the reasoning. To elicit high-quality task data, we take an iterative self-distillation approach that starts from a small amount of unstructured seed knowledge from GPT-3 and then alternates between (1) self-distillation from student models; (2) targeted filtering with a critic model trained by human judgment (to boost validity) and NLI (to boost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Byte Pair Encoding · Dropout · Weight Decay · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Cosine Annealing
