DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation
A B M Ashikur Rahman, Saeed Anwar, Muhammad Usman, Ajmal Mian

TL;DR
This paper introduces DefAn, a large, comprehensive benchmark dataset with over 75,000 prompts designed to evaluate and measure hallucinations in large language models across multiple domains.
Contribution
The paper presents a new extensive benchmark dataset for assessing LLM hallucinations, addressing limitations of existing small, multiple-choice datasets, and provides evaluation results for several prominent LLMs.
Findings
LLMs exhibit hallucination rates from 57% to 82%.
Performance drops significantly on numeric and domain-specific questions.
The dataset effectively evaluates LLM factual accuracy and consistency.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities, revolutionizing the integration of AI in daily life applications. However, they are prone to hallucinations, generating claims that contradict established facts, deviating from prompts, and producing inconsistent responses when the same prompt is presented multiple times. Addressing these issues is challenging due to the lack of comprehensive and easily assessable benchmark datasets. Most existing datasets are small and rely on multiple-choice questions, which are inadequate for evaluating the generative prowess of LLMs. To measure hallucination in LLMs, this paper introduces a comprehensive benchmark dataset comprising over 75,000 prompts across eight domains. These prompts are designed to elicit definitive, concise, and informative answers. The dataset is divided into two segments: one publicly available for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Residue Arithmetic · Big Data and Digital Economy · Radioactive Decay and Measurement Techniques
MethodsLLaMA
