Self-contradictory Hallucinations of Large Language Models: Evaluation,   Detection and Mitigation

Niels M\"undler; Jingxuan He; Slobodan Jenko; Martin Vechev

arXiv:2305.15852·cs.CL·March 19, 2024·46 cites

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Niels M\"undler, Jingxuan He, Slobodan Jenko, Martin Vechev

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates self-contradictions in large language models, presenting evaluation, detection, and mitigation methods that are effective, model-agnostic, and publicly available, significantly reducing hallucinated contradictions in generated text.

Contribution

It introduces a novel prompting-based framework for detecting and mitigating self-contradictions in instruction-tuned LMs, applicable without external knowledge retrieval.

Findings

01

Self-contradictions occur in 17.7% of ChatGPT sentences.

02

The detector achieves around 80% F1 score.

03

The mitigation algorithm effectively reduces contradictions while maintaining fluency.

Abstract

Large language models (large LMs) are susceptible to producing text that contains hallucinated content. An important instance of this problem is self-contradiction, where the LM generates two contradictory sentences within the same context. In this work, we present a comprehensive investigation into self-contradiction for various instruction-tuned LMs, covering evaluation, detection, and mitigation. Our primary evaluation task is open-domain text generation, but we also demonstrate the applicability of our approach to shorter question answering. Our analysis reveals the prevalence of self-contradictions, e.g., in 17.7% of all sentences produced by ChatGPT. We then propose a novel prompting-based framework designed to effectively detect and mitigate self-contradictions. Our detector achieves high accuracy, e.g., around 80% F1 score when prompting ChatGPT. The mitigation algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eth-sri/chatprotect
pytorchOfficial

Videos

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation· slideslive

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Topic Modeling · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam · Residual Connection