Trusted Source Alignment in Large Language Models
Vasilisa Bashlovkina, Zhaobin Kuang, Riley Matthews, Edward Clifford,, Yennie Jun, William W. Cohen, Simon Baumgartner

TL;DR
This paper introduces trusted source alignment (TSA) as a property of large language models, proposes an evaluation dataset FactCheckQA, and demonstrates that larger models like PaLM-2 better align with trusted sources, reaching 80% accuracy.
Contribution
The paper defines TSA, creates the FactCheckQA dataset for its evaluation, and analyzes how model size impacts alignment with trusted sources.
Findings
Model performance on FactCheckQA improves with size
PaLM-2 achieves up to 80% accuracy in TSA
Evaluation protocol considers response extraction and bias
Abstract
Large language models (LLMs) are trained on web-scale corpora that inevitably include contradictory factual information from sources of varying reliability. In this paper, we propose measuring an LLM property called trusted source alignment (TSA): the model's propensity to align with content produced by trusted publishers in the face of uncertainty or controversy. We present FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking articles. We describe a simple protocol for evaluating TSA and offer a detailed analysis of design considerations including response extraction, claim contextualization, and bias in prompt formulation. Applying the protocol to PaLM-2, we find that as we scale up the model size, the model performance on FactCheckQA improves from near-random to up to 80% balanced accuracy in aligning with trusted sources.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsALIGN
