How Reliable are LLMs as Knowledge Bases? Re-thinking Facutality and   Consistency

Danna Zheng; Mirella Lapata; Jeff Z. Pan

arXiv:2407.13578·cs.CL·December 17, 2024·2 cites

How Reliable are LLMs as Knowledge Bases? Re-thinking Facutality and Consistency

Danna Zheng, Mirella Lapata, Jeff Z. Pan

PDF

Open Access

TL;DR

This paper critically examines the reliability of large language models as knowledge bases by introducing new evaluation criteria, datasets, and metrics focused on factuality and consistency, revealing significant challenges in their use.

Contribution

It proposes a comprehensive framework for evaluating LLMs as knowledge bases, including the UnseenQA dataset and new metrics for factuality and consistency.

Findings

01

26 LLMs show significant variability in factuality and consistency

02

Current evaluation methods are insufficient for assessing LLM reliability

03

Highlighting the need for more rigorous and holistic evaluation approaches

Abstract

Large Language Models (LLMs) are increasingly explored as knowledge bases (KBs), yet current evaluation methods focus too narrowly on knowledge retention, overlooking other crucial criteria for reliable performance. In this work, we rethink the requirements for evaluating reliable LLM-as-KB usage and highlight two essential factors: factuality, ensuring accurate responses to seen and unseen knowledge, and consistency, maintaining stable answers to questions about the same knowledge. We introduce UnseenQA, a dataset designed to assess LLM performance on unseen knowledge, and propose new criteria and metrics to quantify factuality and consistency, leading to a final reliability score. Our experiments on 26 LLMs reveal several challenges regarding their use as KBs, underscoring the need for more principled and comprehensive evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Focus · Cosine Annealing · Linear Layer · Weight Decay · Softmax · Multi-Head Attention · Dense Connections