GeDi: Generative Discriminator Guided Sequence Generation
Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish, Keskar, Shafiq Joty, Richard Socher, Nazneen Fatema Rajani

TL;DR
GeDi is a method that uses smaller discriminative models to guide large language models for safer, more controllable text generation, achieving faster speeds and zero-shot topic control.
Contribution
Introduces GeDi, a novel approach that guides large language models with smaller discriminators for improved safety and controllability, including zero-shot topic control.
Findings
GeDi outperforms state-of-the-art controllability methods.
Generation speed is more than 30 times faster with GeDi.
Successfully reduces toxicity in GPT-2 without losing linguistic quality.
Abstract
While large-scale language models (LMs) are able to imitate the distribution of natural language well enough to generate realistic text, it is difficult to control which regions of the distribution they generate. This is especially problematic because datasets used for training large LMs usually contain significant toxicity, hate, bias, and negativity. We propose GeDi as an efficient method for using smaller LMs as generative discriminators to guide generation from large LMs to make them safer and more controllable. GeDi guides generation at each step by computing classification probabilities for all possible next tokens via Bayes rule by normalizing over two class-conditional distributions; one conditioned on the desired attribute, or control code, and another conditioned on the undesired attribute, or anti control code. We find that GeDi gives stronger controllability than the state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
MethodsLinear Layer · Cosine Annealing · Layer Normalization · Weight Decay · Dropout · Dense Connections · Linear Warmup With Cosine Annealing · Attention Dropout · Byte Pair Encoding · Multi-Head Attention
