Language Model Detoxification in Dialogue with Contextualized Stance   Control

Jing Qian; Xifeng Yan

arXiv:2301.10368·cs.CL·January 26, 2023

Language Model Detoxification in Dialogue with Contextualized Stance Control

Jing Qian, Xifeng Yan

PDF

Open Access

TL;DR

This paper presents a novel method for detoxifying language models by controlling responses based on context-dependent stance, effectively reducing toxicity while considering offensive support in dialogue.

Contribution

It introduces meta prefixes for learning contextualized stance control, enabling dynamic detoxification aligned with input context.

Findings

01

Effective context-dependent stance control learned

02

Low self-toxicity maintained

03

Improved detoxification performance

Abstract

To reduce the toxic degeneration in a pretrained Language Model (LM), previous work on Language Model detoxification has focused on reducing the toxicity of the generation itself (self-toxicity) without consideration of the context. As a result, a type of implicit offensive language where the generations support the offensive language in the context is ignored. Different from the LM controlling tasks in previous work, where the desired attributes are fixed for generation, the desired stance of the generation depends on the offensiveness of the context. Therefore, we propose a novel control method to do context-dependent detoxification with the stance taken into consideration. We introduce meta prefixes to learn the contextualized stance control strategy and to generate the stance control prefix according to the input context. The generated stance prefix is then combined with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Topic Modeling