Luna: An Evaluation Foundation Model to Catch Language Model   Hallucinations with High Accuracy and Low Cost

Masha Belyi; Robert Friel; Shuai Shao; Atindriyo Sanyal

arXiv:2406.00975·cs.CL·June 6, 2024·1 cites

Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost

Masha Belyi, Robert Friel, Shuai Shao, Atindriyo Sanyal

PDF

Open Access

TL;DR

Luna is a lightweight, fine-tuned DeBERTA-large model designed to detect hallucinations in RAG systems, achieving high accuracy with significantly reduced cost and latency, suitable for diverse industry applications.

Contribution

Luna introduces a new, efficient hallucination detection model that outperforms GPT-3.5 and commercial tools in accuracy, cost, and latency for RAG systems.

Findings

01

Luna achieves 97% reduction in detection cost.

02

Luna reduces latency by 91%.

03

Luna generalizes across multiple industry domains.

Abstract

Retriever Augmented Generation (RAG) systems have become pivotal in enhancing the capabilities of language models by incorporating external knowledge retrieval mechanisms. However, a significant challenge in deploying these systems in industry applications is the detection and mitigation of hallucinations: instances where the model generates information that is not grounded in the retrieved context. Addressing this issue is crucial for ensuring the reliability and accuracy of responses generated by large language models (LLMs) in diverse industry settings. Current hallucination detection techniques fail to deliver accuracy, low latency, and low cost simultaneously. We introduce Luna: a DeBERTA-large (440M) encoder, finetuned for hallucination detection in RAG settings. We demonstrate that Luna outperforms GPT-3.5 and commercial evaluation frameworks on the hallucination detection task,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · WordPiece · Linear Warmup With Linear Decay · Linear Layer · BART · Adam · Cosine Annealing · Attention Is All You Need · Residual Connection