KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large   Language Models

Yuyang Bai; Shangbin Feng; Vidhisha Balachandran; Zhaoxuan Tan; Shiqi; Lou; Tianxing He; Yulia Tsvetkov

arXiv:2310.09725·cs.CL·March 26, 2024·1 cites

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models

Yuyang Bai, Shangbin Feng, Vidhisha Balachandran, Zhaoxuan Tan, Shiqi, Lou, Tianxing He, Yulia Tsvetkov

PDF

Open Access 1 Repo

TL;DR

KGQuiz is a comprehensive benchmark designed to evaluate the knowledge generalization abilities of large language models across multiple domains and task complexities, revealing strengths in simple tasks and challenges in complex reasoning.

Contribution

The paper introduces KGQuiz, a scalable, triplet-based benchmark covering diverse knowledge domains and task formats to systematically assess LLMs' knowledge generalization capabilities.

Findings

01

LLMs perform well on simple knowledge QA tasks

02

Complex reasoning and domain-specific tasks remain challenging for LLMs

03

KGQuiz enables nuanced analysis of LLMs' knowledge abilities across domains

Abstract

Large language models (LLMs) demonstrate remarkable performance on knowledge-intensive tasks, suggesting that real-world knowledge is encoded in their model parameters. However, besides explorations on a few probing tasks in limited knowledge domains, it is not well understood how to evaluate LLMs' knowledge systematically and how well their knowledge abilities generalize, across a spectrum of knowledge domains and progressively complex task formats. To this end, we propose KGQuiz, a knowledge-intensive benchmark to comprehensively investigate the knowledge generalization abilities of LLMs. KGQuiz is a scalable framework constructed from triplet-based knowledge, which covers three knowledge domains and consists of five tasks with increasing complexity: true-or-false, multiple-choice QA, blank filling, factual editing, and open-ended knowledge generation. To gain a better understanding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leopoldwhite/kgquiz
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods