Formal Language Knowledge Corpus for Retrieval Augmented Generation
Majd Zayyad, Yossi Adi

TL;DR
This paper investigates using the Lean proof language to build a knowledge corpus for retrieval-augmented generation systems, aiming to enhance LLMs' performance in complex mathematical reasoning tasks.
Contribution
It introduces a novel approach of leveraging Lean to create a specialized knowledge corpus for RAG systems focused on mathematical reasoning.
Findings
Potential for improved reasoning in LLMs using Lean-based knowledge corpus
Lays groundwork for future research in RAG-enhanced logical reasoning
Highlights the need for specialized corpora in advanced reasoning tasks
Abstract
The integration of retrieval-augmented techniques with LLMs has shown promise in improving performance across various domains. However, their utility in tasks requiring advanced reasoning, such as generating and evaluating mathematical statements and proofs, remains underexplored. This study explores the use of Lean, a programming language for writing mathematical proofs, to populate the knowledge corpus used by RAG systems. We hope for this to lay the foundation to exploring different methods of using RAGs to improve the performance of LLMs in advanced logical reasoning tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Adam · Weight Decay · Multi-Head Attention · Layer Normalization · WordPiece · Dropout · Softmax
