THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation   in Large Language Models

Mengfei Liang; Archish Arun; Zekun Wu; Cristian Munoz; Jonathan Lutch,; Emre Kazim; Adriano Koshiyama; Philip Treleaven

arXiv:2409.11353·cs.CL·January 22, 2025

THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models

Mengfei Liang, Archish Arun, Zekun Wu, Cristian Munoz, Jonathan Lutch,, Emre Kazim, Adriano Koshiyama, Philip Treleaven

PDF

Open Access 1 Repo

TL;DR

THaMES is an integrated framework that automates the evaluation and mitigation of hallucinations in large language models, improving their factual accuracy across various tasks with adaptable strategies.

Contribution

It introduces an end-to-end, standardized pipeline for hallucination detection and mitigation, combining automated test set generation, benchmarking, and multiple mitigation techniques.

Findings

01

Commercial models like GPT-4o benefit more from RAG.

02

Open-weight models like Llama-3.1-8B-Instruct gain from ICL.

03

PEFT improves Llama-3.1-8B-Instruct's performance.

Abstract

Hallucination, the generation of factually incorrect content, is a growing challenge in Large Language Models (LLMs). Existing detection and mitigation methods are often isolated and insufficient for domain-specific needs, lacking a standardized pipeline. This paper introduces THaMES (Tool for Hallucination Mitigations and EvaluationS), an integrated framework and library addressing this gap. THaMES offers an end-to-end solution for evaluating and mitigating hallucinations in LLMs, featuring automated test set generation, multifaceted benchmarking, and adaptable mitigation strategies. It automates test set creation from any corpus, ensuring high data quality, diversity, and cost-efficiency through techniques like batch processing, weighted sampling, and counterfactual validation. THaMES assesses a model's ability to detect and reduce hallucinations across various tasks, including text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

holistic-ai/THaMES
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics · Pharmacovigilance and Adverse Drug Reactions · Epilepsy research and treatment

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Sparse Evolutionary Training · Attention Dropout · WordPiece · Dense Connections · Residual Connection · Linear Layer · Multi-Head Attention · Linear Warmup With Linear Decay · Adam