Trusted Source Alignment in Large Language Models

Vasilisa Bashlovkina; Zhaobin Kuang; Riley Matthews; Edward Clifford,; Yennie Jun; William W. Cohen; Simon Baumgartner

arXiv:2311.06697·cs.CL·November 14, 2023·1 cites

Trusted Source Alignment in Large Language Models

Vasilisa Bashlovkina, Zhaobin Kuang, Riley Matthews, Edward Clifford,, Yennie Jun, William W. Cohen, Simon Baumgartner

PDF

Open Access

TL;DR

This paper introduces trusted source alignment (TSA) as a property of large language models, proposes an evaluation dataset FactCheckQA, and demonstrates that larger models like PaLM-2 better align with trusted sources, reaching 80% accuracy.

Contribution

The paper defines TSA, creates the FactCheckQA dataset for its evaluation, and analyzes how model size impacts alignment with trusted sources.

Findings

01

Model performance on FactCheckQA improves with size

02

PaLM-2 achieves up to 80% accuracy in TSA

03

Evaluation protocol considers response extraction and bias

Abstract

Large language models (LLMs) are trained on web-scale corpora that inevitably include contradictory factual information from sources of varying reliability. In this paper, we propose measuring an LLM property called trusted source alignment (TSA): the model's propensity to align with content produced by trusted publishers in the face of uncertainty or controversy. We present FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking articles. We describe a simple protocol for evaluating TSA and offer a detailed analysis of design considerations including response extraction, claim contextualization, and bias in prompt formulation. Applying the protocol to PaLM-2, we find that as we scale up the model size, the model performance on FactCheckQA improves from near-random to up to 80% balanced accuracy in aligning with trusted sources.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsALIGN