Domain-oriented RAG Assessment (DoRA): Synthetic Benchmarking for RAG-based Question Answering on Defense Documents

Bao Gia Doan; Aditya Joshi; Pantelis Elinas; Aarya Bodhankar; Oscar Leslie; Tom Marchant; Flora Salim

arXiv:2604.17943·cs.CL·April 21, 2026

Domain-oriented RAG Assessment (DoRA): Synthetic Benchmarking for RAG-based Question Answering on Defense Documents

Bao Gia Doan, Aditya Joshi, Pantelis Elinas, Aarya Bodhankar, Oscar Leslie, Tom Marchant, Flora Salim

PDF

TL;DR

DoRA is a domain-specific benchmark for RAG-based question answering on defense documents, enabling more accurate evaluation of model performance and attribution in specialized contexts.

Contribution

It introduces a synthetic, domain-grounded benchmark with auditable evidence passages, covering multiple question types for RAG evaluation in defense documents.

Findings

01

Models trained on DoRA outperform base models in QA success by up to 26%.

02

DoRA training reduces hallucination rates by 47%, improving faithfulness.

03

General-purpose LMs perform similarly on the benchmark, highlighting domain-specific challenges.

Abstract

Open-domain RAG benchmarks over public corpora can overestimate deployment performance due to pretraining overlap and weak attribution requirements. We present DoRA (Domain-oriented RAG Assessment), a domain-grounded benchmark built from defense documents that pairs synthetic, intent-conditioned QA (question answering) with auditable evidence passages for attribution. DoRA covers five question types (find, explain, summarize, generate, provide) and contains 6.5K curated instances. In end-to-end evaluation with a fixed dense retriever, general-purpose Language Models (LMs) perform similarly, while a model trained on DoRA (DoRA SFT) yields large gains over the base model (Llama3.1-8B-Instruct): up to 26% improvement in QA task success, while reducing the hallucination rate by 47% in RAG faithfulness scores, supporting contamination-aware regression testing under domain shift.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.