Does Self-Consistency Improve the Recall of Encyclopedic Knowledge?
Sho Hoshino, Ukyo Honda, Peinan Zhang

TL;DR
This paper investigates whether self-consistency enhances the recall of encyclopedic knowledge in language models, establishing a new evaluation split and demonstrating consistent performance improvements.
Contribution
It introduces a knowledge recall split for MMLU, validating it, and shows that self-consistency improves knowledge recall performance, achieving new state-of-the-art accuracy.
Findings
Self-consistency improves performance on knowledge recall tasks.
The new split mirrors performance patterns of existing benchmarks.
Achieves 89% accuracy on MMLU with GPT-4o.
Abstract
While self-consistency is known to improve performance on symbolic reasoning, its effect on the recall of encyclopedic knowledge is unclear due to a lack of targeted evaluation grounds. To address this, we establish such a knowledge recall split for the popular MMLU benchmark by applying a data-driven heuristic from prior work. We validate this split by showing that the performance patterns on the symbolic reasoning and knowledge recall subsets mirror those of GSM8K and MedMCQA, respectively. Using this solid ground, we find that self-consistency consistently improves performance across both symbolic reasoning and knowledge recall, even though its underlying CoT prompting is primarily effective for symbolic reasoning. As a result, we achieve an 89\% accuracy on MMLU, the best performance to date with the use of GPT-4o.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
