Confident RAG: Enhancing the Performance of LLMs for Mathematics Question Answering through Multi-Embedding and Confidence Scoring

Shiting Chen; Zijian Zhao; Jinsong Chen

arXiv:2507.17442·cs.CL·December 2, 2025

Confident RAG: Enhancing the Performance of LLMs for Mathematics Question Answering through Multi-Embedding and Confidence Scoring

Shiting Chen, Zijian Zhao, Jinsong Chen

PDF

Open Access

TL;DR

This paper introduces Confident RAG, a method that improves mathematical question answering by generating multiple answers and selecting the most confident one, resulting in significant accuracy gains over traditional approaches.

Contribution

The paper proposes Confident RAG, a novel approach that combines multiple answer generation with confidence scoring to enhance LLM performance in math QA tasks.

Findings

01

Confident RAG improves accuracy by ~10% over vanilla LLMs.

02

Confident RAG outperforms vanilla RAG by ~5%.

03

The approach is effective across different models and embeddings.

Abstract

Large Language Models (LLMs) hold significant promise for mathematics education, yet they often struggle with complex mathematical reasoning. While Retrieval-Augmented Generation (RAG) mitigates these issues by grounding LLMs in external knowledge, its effectiveness remains unstable, heavily dependent on the choice of a single embedding model. Moving beyond static RAG workflows, we draw on agentic workflow patterns, a paradigm that introduces structured task decomposition and collaboration to enhance system performance. We propose and examine two novel approaches that combine the benefits of multiple embedding models. While our Mixture-Embedding RAG approach (fusing retrieved documents) shows limited gains, our Confident RAG method (generating multiple answers and selecting the one with the highest confidence score) demonstrates significant improvement. Experimental results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning