Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators
Liang Chen, Yang Deng, Yatao Bian, Zeyu Qin, Bingzhe Wu, Tat-Seng, Chua, Kam-Fai Wong

TL;DR
This paper introduces CONNER, a comprehensive framework for evaluating large language models' generated knowledge across multiple dimensions, revealing insights into their factuality, relevance, and coherence in knowledge-intensive tasks.
Contribution
The paper presents CONNER, a new automatic evaluation framework for assessing LLM-generated knowledge from six perspectives, and demonstrates its utility in improving knowledge-intensive tasks.
Findings
Factuality of generated knowledge has limited impact on downstream tasks.
Relevance and coherence are more critical than factual accuracy.
Prompt engineering and knowledge selection improve task performance.
Abstract
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks when being prompted to generate world knowledge. However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. In light of this, we introduce CONNER, a COmpreheNsive kNowledge Evaluation fRamework, designed to systematically and automatically evaluate generated knowledge from six important perspectives -- Factuality, Relevance, Coherence, Informativeness, Helpfulness and Validity. We conduct an extensive empirical analysis of the generated knowledge from three different types of LLMs on two widely studied knowledge-intensive tasks, i.e., open-domain question answering and knowledge-grounded dialogue. Surprisingly, our study reveals that the factuality of generated knowledge, even if lower, does not significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems
