RAGSmith: A Framework for Finding the Optimal Composition of Retrieval-Augmented Generation Methods Across Datasets
Muhammed Yusuf Kartal (1), Suha Kagan Kose (2), Korhan Sevin\c{c} (1), Burak Aktas (2) ((1) TOBB University of Economics, Technology, (2) Roketsan Inc.)

TL;DR
RAGSmith is a modular framework that uses evolutionary search to optimize retrieval-augmented generation pipelines across diverse datasets, significantly improving performance over naive configurations.
Contribution
It introduces a comprehensive end-to-end architecture search method for RAG systems, optimizing over multiple modules and configurations with a genetic algorithm.
Findings
RAGSmith outperforms naive RAG baselines by +3.8% on average across domains.
The search explores about 0.2% of the configuration space, roughly 100 candidates.
A robust backbone of vector retrieval plus post-generation reflection is commonly selected.
Abstract
Retrieval-Augmented Generation (RAG) quality depends on many interacting choices across retrieval, ranking, augmentation, prompting, and generation, so optimizing modules in isolation is brittle. We introduce RAGSmith, a modular framework that treats RAG design as an end-to-end architecture search over nine technique families and 46{,}080 feasible pipeline configurations. A genetic search optimizes a scalar objective that jointly aggregates retrieval metrics (recall@k, mAP, nDCG, MRR) and generation metrics (LLM-Judge and semantic similarity). We evaluate on six Wikipedia-derived domains (Mathematics, Law, Finance, Medicine, Defense Industry, Computer Science), each with 100 questions spanning factual, interpretation, and long-answer types. RAGSmith finds configurations that consistently outperform naive RAG baseline by +3.8\% on average (range +1.2\% to +6.9\% across domains), with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Biomedical Text Mining and Ontologies
