RAGAPHENE: A RAG Annotation Platform with Human Enhancements and Edits

Kshitij Fadnis; Sara Rosenthal; Maeda Hanafi; Yannis Katsis; Marina Danilevsky

arXiv:2508.19272·cs.CL·August 28, 2025

RAGAPHENE: A RAG Annotation Platform with Human Enhancements and Edits

Kshitij Fadnis, Sara Rosenthal, Maeda Hanafi, Yannis Katsis, Marina Danilevsky

PDF

TL;DR

RAGAPHENE is a chat-based annotation platform designed to simulate real-world conversations, enabling better benchmarking and evaluation of Large Language Models' retrieval-augmented generation capabilities.

Contribution

The paper introduces RAGAPHENE, a novel platform that facilitates human-enhanced, multi-turn RAG conversation annotation for LLM evaluation.

Findings

01

Successfully used by 40 annotators to build thousands of conversations.

02

Enhances the quality of RAG benchmarks with human simulation.

03

Supports multi-turn, real-world conversation modeling.

Abstract

Retrieval Augmented Generation (RAG) is an important aspect of conversing with Large Language Models (LLMs) when factually correct information is important. LLMs may provide answers that appear correct, but could contain hallucinated information. Thus, building benchmarks that can evaluate LLMs on multi-turn RAG conversations has become an increasingly important task. Simulating real-world conversations is vital for producing high quality evaluation benchmarks. We present RAGAPHENE, a chat-based annotation platform that enables annotators to simulate real-world conversations for benchmarking and evaluating LLMs. RAGAPHENE has been successfully used by approximately 40 annotators to build thousands of real-world conversations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.