# Decoding Memories: An Efficient Pipeline for Self-Consistency Hallucination Detection

**Authors:** Weizhi Gao, Xiaorui Liu, Feiyi Wang, Dan Lu, Junqi Yin

arXiv: 2508.21228 · 2025-09-01

## TL;DR

This paper introduces a Decoding Memory Pipeline (DMP) that enhances the efficiency of self-consistency hallucination detection in large language models by reducing redundant computations, achieving up to 3x speedup.

## Contribution

It presents a novel approach to identify and eliminate redundancy in self-consistency methods, significantly improving generation speed without losing accuracy.

## Key findings

- Achieves up to 3x speedup in generation efficiency
- Maintains AUROC performance comparable to baseline methods
- Applicable to various tasks beyond hallucination detection

## Abstract

Large language models (LLMs) have demonstrated impressive performance in both research and real-world applications, but they still struggle with hallucination. Existing hallucination detection methods often perform poorly on sentence-level generation or rely heavily on domain-specific knowledge. While self-consistency approaches help address these limitations, they incur high computational costs due to repeated generation. In this paper, we conduct the first study on identifying redundancy in self-consistency methods, manifested as shared prefix tokens across generations, and observe that non-exact-answer tokens contribute minimally to the semantic content. Based on these insights, we propose a novel Decoding Memory Pipeline (DMP) that accelerates generation through selective inference and annealed decoding. Being orthogonal to the model, dataset, decoding strategy, and self-consistency baseline, our DMP consistently improves the efficiency of multi-response generation and holds promise for extension to alignment and reasoning tasks. Extensive experiments show that our method achieves up to a 3x speedup without sacrificing AUROC performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21228/full.md

## Figures

24 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21228/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/2508.21228/full.md

---
Source: https://tomesphere.com/paper/2508.21228