Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
Irene Solaiman (1), Christy Dennison (1) ((1) OpenAI)

TL;DR
PALMS is an iterative process that fine-tunes language models on value-targeted datasets to reduce harmful biases and align outputs with societal values, showing effectiveness across GPT-3 sizes.
Contribution
The paper introduces PALMS, a novel iterative fine-tuning method using curated datasets to align language models with societal values without losing capability.
Findings
PALMS significantly improves adherence to target values and reduces toxicity.
Effectiveness of PALMS increases with larger model sizes.
A small, curated dataset can substantially alter model behavior.
Abstract
Language models can generate harmful and biased outputs and exhibit undesirable behavior according to a given cultural context. We propose a Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values. We evaluate our process using three metrics: quantitative metrics with human evaluations that score output adherence to a target value, toxicity scoring on outputs; and qualitative metrics analyzing the most common word associated with a given social category. Through each iteration, we add additional training dataset examples based on observed shortcomings from evaluations. PALMS performs significantly better on all metrics compared to baseline and control models for a broad range of GPT-3 language model sizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Weight Decay · Dropout · Layer Normalization · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines
