Think Twice Before Trusting: Self-Detection for Large Language Models   through Comprehensive Answer Reflection

Moxin Li; Wenjie Wang; Fuli Feng; Fengbin Zhu; Qifan Wang; Tat-Seng; Chua

arXiv:2403.09972·cs.CL·September 30, 2024·5 cites

Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection

Moxin Li, Wenjie Wang, Fuli Feng, Fengbin Zhu, Qifan Wang, Tat-Seng, Chua

PDF

Open Access

TL;DR

This paper introduces a new self-detection framework for LLMs that evaluates multiple candidate answers through reflection and justification, reducing over-trust in incorrect outputs and improving trustworthiness assessment.

Contribution

It proposes a comprehensive answer reflection paradigm and a two-step framework that enhances self-detection by considering multiple answers and their justifications.

Findings

01

Effective in reducing over-trust in incorrect answers

02

Improves self-detection accuracy across multiple datasets

03

Seamlessly integrates with existing LLM self-evaluation methods

Abstract

Self-detection for Large Language Models (LLMs) seeks to evaluate the trustworthiness of the LLM's output by leveraging its own capabilities, thereby alleviating the issue of output hallucination. However, existing self-detection approaches only retrospectively evaluate answers generated by LLM, typically leading to the over-trust in incorrectly generated answers. To tackle this limitation, we propose a novel self-detection paradigm that considers the comprehensive answer space beyond LLM-generated answers. It thoroughly compares the trustworthiness of multiple candidate answers to mitigate the over-trust in LLM-generated incorrect answers. Building upon this paradigm, we introduce a two-step framework, which firstly instructs LLM to reflect and provide justifications for each candidate answer, and then aggregates the justifications for comprehensive target answer evaluation. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Speech and dialogue systems