Statistical Knowledge Assessment for Large Language Models

Qingxiu Dong; Jingjing Xu; Lingpeng Kong; Zhifang Sui; Lei Li

arXiv:2305.10519·cs.CL·October 31, 2023·2 cites

Statistical Knowledge Assessment for Large Language Models

Qingxiu Dong, Jingjing Xu, Lingpeng Kong, Zhifang Sui, Lei Li

PDF

Open Access 1 Video

TL;DR

This paper introduces KaRR, a statistical method to quantify factual knowledge in large language models by estimating the likelihood of correct answers across diverse prompts, correlating well with human judgment.

Contribution

The paper presents KaRR, a novel statistical approach for assessing factual knowledge in LLMs, along with a comprehensive evaluation suite and analysis of model scaling and tuning effects.

Findings

01

KaRR correlates strongly (0.43 Kendall's τ) with human assessments.

02

Model scaling laws hold for knowledge retention in LLMs.

03

Instruction tuning may reduce factual reliability in LLMs.

Abstract

Given varying prompts regarding a factoid question, can a large language model (LLM) reliably generate factually correct answers? Existing LLMs may generate distinct responses for different prompts. In this paper, we study the problem of quantifying knowledge contained in an LLM regarding a given set of facts. We propose KaRR, a statistical approach to assess factual knowledge for LLMs. The main idea is to estimate the ratio of LLM generating text corresponding to the answer entity given diverse prompts of the subject and the querying relation, versus it generating by random chances. Our assessment suite contains a comprehensive set of 994,123 entities and 600 relations, with 1,395,905 text aliases. We use our method to evaluate 20 LLMs of various sizes, including LLaMA, Alpaca, OPT, etc. Experiments show that our results have a strong correlation (0.43 Kendall's $τ$ ) with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Statistical Knowledge Assessment for Large Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods