GRATH: Gradual Self-Truthifying for Large Language Models

Weixin Chen; Dawn Song; Bo Li

arXiv:2401.12292·cs.CL·February 1, 2024·2 cites

GRATH: Gradual Self-Truthifying for Large Language Models

Weixin Chen, Dawn Song, Bo Li

PDF

Open Access 3 Models 1 Datasets

TL;DR

GRATH is a novel self-supervised post-processing method that iteratively improves the truthfulness of large language models by using pairwise training data and preference optimization, achieving state-of-the-art results on TruthfulQA.

Contribution

The paper introduces GRATH, a new iterative self-truthifying approach that enhances LLM truthfulness without sacrificing other capabilities, outperforming larger models on benchmarks.

Findings

01

GRATH improves truthfulness of 7B-LLMs significantly.

02

Achieves state-of-the-art accuracy on TruthfulQA.

03

Enhances model truthfulness without degrading performance.

Abstract

Truthfulness is paramount for large language models (LLMs) as they are increasingly deployed in real-world applications. However, existing LLMs still struggle with generating truthful content, as evidenced by their modest performance on benchmarks like TruthfulQA. To address this issue, we propose GRAdual self-truTHifying (GRATH), a novel post-processing method to enhance truthfulness of LLMs. GRATH utilizes out-of-domain question prompts to generate pairwise truthfulness training data with each pair containing a question and its correct and incorrect answers, and then optimizes the model via direct preference optimization (DPO) to learn from the truthfulness difference between answer pairs. GRATH iteratively refines truthfulness data and updates the model, leading to a gradual improvement in model truthfulness in a self-supervised manner. Empirically, we evaluate GRATH using different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

weixinchen/GRATH
dataset· 11 dl
11 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques