SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety   Failures

Megan Ung; Jing Xu; Y-Lan Boureau

arXiv:2110.07518·cs.CL·May 6, 2022

SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures

Megan Ung, Jing Xu, Y-Lan Boureau

PDF

Open Access

TL;DR

SaFeRDialogues introduces a dataset and method for training conversational models to respond gracefully to safety feedback, improving civility without losing engagement.

Contribution

The paper presents a new dataset and fine-tuning approach enabling models to handle safety feedback more gracefully, enhancing conversational civility.

Findings

01

Models fine-tuned on SaFeRDialogues produce more civil responses.

02

Fine-tuning does not reduce engagingness or overall conversational quality.

03

Human raters prefer models trained with this dataset for safer, more respectful interactions.

Abstract

Current open-domain conversational models can easily be made to talk in inadequate ways. Online learning from conversational feedback given by the conversation partner is a promising avenue for a model to improve and adapt, so as to generate fewer of these safety failures. However, current state-of-the-art models tend to react to feedback with defensive or oblivious responses. This makes for an unpleasant experience and may discourage conversation partners from giving feedback in the future. This work proposes SaFeRDialogues, a task and dataset of graceful responses to conversational feedback about safety failures. We collect a dataset of 10k dialogues demonstrating safety failures, feedback signaling them, and a response acknowledging the feedback. We show how fine-tuning on this dataset results in conversations that human raters deem considerably more likely to lead to a civil…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multi-Agent Systems and Negotiation · Software Engineering Research