DoTA-RAG: Dynamic of Thought Aggregation RAG

Saksorn Ruangtanusak; Natthapath Rungseesiripak; Peerawat Rojratchadakorn; Monthol Charattrakool; Natapong Nitarach

arXiv:2506.12571·cs.CL·June 17, 2025

DoTA-RAG: Dynamic of Thought Aggregation RAG

Saksorn Ruangtanusak, Natthapath Rungseesiripak, Peerawat Rojratchadakorn, Monthol Charattrakool, Natapong Nitarach

PDF

Open Access 1 Datasets

TL;DR

DoTA-RAG is a retrieval-augmented generation system designed for fast, accurate access to large-scale web knowledge, improving answer correctness and efficiency through dynamic routing and optimized embedding models.

Contribution

The paper introduces DoTA-RAG, a novel three-stage retrieval pipeline with dynamic routing and enhanced embedding evaluation, enabling high-throughput, accurate web knowledge retrieval.

Findings

01

Answer correctness score improved from 0.752 to 1.478.

02

Achieved a 0.929 correctness score on the Live Challenge Day.

03

Maintains low latency while handling large, diverse datasets.

Abstract

In this paper, we introduce DoTA-RAG (Dynamic-of-Thought Aggregation RAG), a retrieval-augmented generation system optimized for high-throughput, large-scale web knowledge indexes. Traditional RAG pipelines often suffer from high latency and limited accuracy over massive, diverse datasets. DoTA-RAG addresses these challenges with a three-stage pipeline: query rewriting, dynamic routing to specialized sub-indexes, and multi-stage retrieval and ranking. We further enhance retrieval by evaluating and selecting a superior embedding model, re-embedding the large FineWeb-10BT corpus. Moreover, we create a diverse Q&A dataset of 500 questions generated via the DataMorgana setup across a broad range of WebOrganizer topics and formats. DoTA-RAG improves the answer correctness score from 0.752 (baseline, using LiveRAG pre-built vector store) to 1.478 while maintaining low latency, and it achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

LiveRAG/Reports
dataset· 273 dl
273 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFunctional Brain Connectivity Studies