Confident RAG: Enhancing the Performance of LLMs for Mathematics Question Answering through Multi-Embedding and Confidence Scoring
Shiting Chen, Zijian Zhao, Jinsong Chen

TL;DR
This paper introduces Confident RAG, a method that improves mathematical question answering by generating multiple answers and selecting the most confident one, resulting in significant accuracy gains over traditional approaches.
Contribution
The paper proposes Confident RAG, a novel approach that combines multiple answer generation with confidence scoring to enhance LLM performance in math QA tasks.
Findings
Confident RAG improves accuracy by ~10% over vanilla LLMs.
Confident RAG outperforms vanilla RAG by ~5%.
The approach is effective across different models and embeddings.
Abstract
Large Language Models (LLMs) hold significant promise for mathematics education, yet they often struggle with complex mathematical reasoning. While Retrieval-Augmented Generation (RAG) mitigates these issues by grounding LLMs in external knowledge, its effectiveness remains unstable, heavily dependent on the choice of a single embedding model. Moving beyond static RAG workflows, we draw on agentic workflow patterns, a paradigm that introduces structured task decomposition and collaboration to enhance system performance. We propose and examine two novel approaches that combine the benefits of multiple embedding models. While our Mixture-Embedding RAG approach (fusing retrieved documents) shows limited gains, our Confident RAG method (generating multiple answers and selecting the one with the highest confidence score) demonstrates significant improvement. Experimental results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning
