FACTOID: FACtual enTailment fOr hallucInation Detection
Vipula Rawte, S.M Towhidul Islam Tonmoy, Krishnav Rajbangshi, Shravani, Nag, Aman Chadha, Amit P. Sheth, Amitava Das

TL;DR
This paper introduces FACTOID, a new factual entailment framework and benchmark for detecting hallucinations in LLM outputs, significantly improving accuracy over existing methods and providing a way to rank LLMs by hallucination vulnerability.
Contribution
It proposes a novel Factual Entailment task, a benchmark dataset, and a multi-task learning approach that enhances hallucination detection in LLMs.
Findings
MTL framework improves FE accuracy by 40% over SOTA TE methods.
FACTOID benchmark enables effective evaluation of hallucination detection.
Auto HVI_auto ranks LLMs by hallucination vulnerability.
Abstract
The widespread adoption of Large Language Models (LLMs) has facilitated numerous benefits. However, hallucination is a significant concern. In response, Retrieval Augmented Generation (RAG) has emerged as a highly promising paradigm to improve LLM outputs by grounding them in factual information. RAG relies on textual entailment (TE) or similar methods to check if the text produced by LLMs is supported or contradicted, compared to retrieved documents. This paper argues that conventional TE methods are inadequate for spotting hallucinations in content generated by LLMs. For instance, consider a prompt about the 'USA's stance on the Ukraine war''. The AI-generated text states, ...U.S. President Barack Obama says the U.S. will not put troops in Ukraine...'' However, during the war the U.S. president is Joe Biden which contradicts factual reality. Moreover, current TE systems are unable to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraditional Chinese Medicine Studies · Epilepsy research and treatment
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · WordPiece · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · Residual Connection · Attention Is All You Need · Cosine Annealing
