Beyond Factuality: A Comprehensive Evaluation of Large Language Models   as Knowledge Generators

Liang Chen; Yang Deng; Yatao Bian; Zeyu Qin; Bingzhe Wu; Tat-Seng; Chua; Kam-Fai Wong

arXiv:2310.07289·cs.CL·October 12, 2023

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

Liang Chen, Yang Deng, Yatao Bian, Zeyu Qin, Bingzhe Wu, Tat-Seng, Chua, Kam-Fai Wong

PDF

Open Access 1 Repo

TL;DR

This paper introduces CONNER, a comprehensive framework for evaluating large language models' generated knowledge across multiple dimensions, revealing insights into their factuality, relevance, and coherence in knowledge-intensive tasks.

Contribution

The paper presents CONNER, a new automatic evaluation framework for assessing LLM-generated knowledge from six perspectives, and demonstrates its utility in improving knowledge-intensive tasks.

Findings

01

Factuality of generated knowledge has limited impact on downstream tasks.

02

Relevance and coherence are more critical than factual accuracy.

03

Prompt engineering and knowledge selection improve task performance.

Abstract

Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks when being prompted to generate world knowledge. However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. In light of this, we introduce CONNER, a COmpreheNsive kNowledge Evaluation fRamework, designed to systematically and automatically evaluate generated knowledge from six important perspectives -- Factuality, Relevance, Coherence, Informativeness, Helpfulness and Validity. We conduct an extensive empirical analysis of the generated knowledge from three different types of LLMs on two widely studied knowledge-intensive tasks, i.e., open-domain question answering and knowledge-grounded dialogue. Surprisingly, our study reveals that the factuality of generated knowledge, even if lower, does not significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chanliang/conner
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems