Benchmarking and Improving Generator-Validator Consistency of Language   Models

Xiang Lisa Li; Vaishnavi Shrivastava; Siyan Li; Tatsunori Hashimoto,; Percy Liang

arXiv:2310.01846·cs.CL·October 4, 2023·2 cites

Benchmarking and Improving Generator-Validator Consistency of Language Models

Xiang Lisa Li, Vaishnavi Shrivastava, Siyan Li, Tatsunori Hashimoto,, Percy Liang

PDF

Open Access

TL;DR

This paper introduces a framework to measure and enhance the consistency between language model generation and validation, significantly improving GPT-like models' reliability across diverse tasks without labeled data.

Contribution

It proposes a novel GV-consistency metric and a fine-tuning method that boosts model consistency, quality, and accuracy across multiple domains.

Findings

01

GPT-4 has 76% GV-consistency.

02

Fine-tuning improves Alpaca-30B GV-consistency from 60% to 93%.

03

Method enhances generator quality by 16% and validator accuracy by 6.3%.

Abstract

As of September 2023, ChatGPT correctly answers "what is 7+8" with 15, but when asked "7+8=15, True or False" it responds with "False". This inconsistency between generating and validating an answer is prevalent in language models (LMs) and erodes trust. In this paper, we propose a framework for measuring the consistency between generation and validation (which we call generator-validator consistency, or GV-consistency), finding that even GPT-4, a state-of-the-art LM, is GV-consistent only 76% of the time. To improve the consistency of LMs, we propose to finetune on the filtered generator and validator responses that are GV-consistent, and call this approach consistency fine-tuning. We find that this approach improves GV-consistency of Alpaca-30B from 60% to 93%, and the improvement extrapolates to unseen tasks and domains (e.g., GV-consistency for positive style transfers extrapolates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare

MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization