Quark: Controllable Text Generation with Reinforced Unlearning

Ximing Lu; Sean Welleck; Jack Hessel; Liwei Jiang; Lianhui Qin; Peter; West; Prithviraj Ammanabrolu; Yejin Choi

arXiv:2205.13636·cs.CL·November 18, 2022·45 cites

Quark: Controllable Text Generation with Reinforced Unlearning

Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter, West, Prithviraj Ammanabrolu, Yejin Choi

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

Quark is a novel method for fine-tuning language models to unlearn undesirable behaviors like toxicity and repetition by using reward-based conditioning, outperforming existing reinforcement learning approaches.

Contribution

Introduces Quantized Reward Konditioning (Quark), a new algorithm for controlled unlearning in language models that leverages reward quantiles and standard language modeling techniques.

Findings

01

Quark effectively reduces toxicity, negative sentiment, and repetition in generated text.

02

Outperforms PPO and other baselines in unlearning undesirable behaviors.

03

Relies solely on standard language modeling primitives, simplifying implementation.

Abstract

Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a different sentiment than desired by the user. We consider the task of unlearning these misalignments by fine-tuning the language model on signals of what not to do. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model. Quark alternates between (i) collecting samples with the current language model, (ii) sorting them into quantiles based on reward, with each quantile identified by a reward token prepended to the language model's input, and (iii) using a standard language modeling loss on samples from each quantile conditioned on its reward token, while remaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gximinglu/quark
pytorchOfficial

Models

🤗
sauc-abadal-lloret/gpt-j-6b-ALT-Quark-tldr
model· 1 dl
1 dl

Videos

QUARK: Controllable Text Generation with Reinforced Unlearning· slideslive

Taxonomy

TopicsTopic Modeling

MethodsEntropy Regularization · Proximal Policy Optimization