Exploring the Limits of Domain-Adaptive Training for Detoxifying   Large-Scale Language Models

Boxin Wang; Wei Ping; Chaowei Xiao; Peng Xu; Mostofa Patwary; Mohammad; Shoeybi; Bo Li; Anima Anandkumar; Bryan Catanzaro

arXiv:2202.04173·cs.CL·October 25, 2022·20 cites

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad, Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates domain-adaptive training methods to reduce toxicity in large-scale language models, proposing self-generated non-toxic datasets and parameter-efficient techniques, demonstrating improved detoxification across various model sizes.

Contribution

It introduces a self-generation approach for creating non-toxic datasets and evaluates detoxification methods on models up to 530B parameters, highlighting the effectiveness of adapter layers.

Findings

01

Self-generated datasets outperform curated ones in detoxification.

02

Larger models have similar toxicity levels as smaller ones with same pre-training data.

03

Adapter-based training achieves better toxicity-perplexity trade-offs.

Abstract

Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we propose to leverage the generative power of LMs and generate nontoxic datasets for domain-adaptive training, which mitigates the exposure bias and is shown to be more data-efficient than using a curated pre-training corpus. We demonstrate that the self-generation method consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 1/3 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3x larger than GPT-3), a scale that has never been…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NVIDIA/Megatron-LM
pytorchOfficial

Videos

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning