Attribute Controlled Fine-tuning for Large Language Models: A Case Study   on Detoxification

Tao Meng; Ninareh Mehrabi; Palash Goyal; Anil Ramakrishna; Aram; Galstyan; Richard Zemel; Kai-Wei Chang; Rahul Gupta; Charith Peris

arXiv:2410.05559·cs.CL·October 10, 2024

Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification

Tao Meng, Ninareh Mehrabi, Palash Goyal, Anil Ramakrishna, Aram, Galstyan, Richard Zemel, Kai-Wei Chang, Rahul Gupta, Charith Peris

PDF

Open Access

TL;DR

This paper introduces a novel constraint learning method for fine-tuning large language models to control specific attributes, demonstrated through reducing toxicity without sacrificing overall performance.

Contribution

It presents a new regularization approach using an auxiliary model to enforce sequence-level constraints during fine-tuning of LLMs.

Findings

01

Reduces toxicity in generated responses

02

Maintains high utility and generation quality

03

Achieves competitive benchmark performance

Abstract

We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control. Given a training corpus and control criteria formulated as a sequence-level constraint on model outputs, our method fine-tunes the LLM on the training corpus while enhancing constraint satisfaction with minimal impact on its utility and generation quality. Specifically, our approach regularizes the LLM training by penalizing the KL divergence between the desired output distribution, which satisfies the constraints, and the LLM's posterior. This regularization term can be approximated by an auxiliary model trained to decompose the sequence-level constraints into token-level guidance, allowing the term to be measured by a closed-form formulation. To further improve efficiency, we design a parallel scheme for concurrently updating both the LLM and the auxiliary model. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling