RAGAPHENE: A RAG Annotation Platform with Human Enhancements and Edits
Kshitij Fadnis, Sara Rosenthal, Maeda Hanafi, Yannis Katsis, Marina Danilevsky

TL;DR
RAGAPHENE is a chat-based annotation platform designed to simulate real-world conversations, enabling better benchmarking and evaluation of Large Language Models' retrieval-augmented generation capabilities.
Contribution
The paper introduces RAGAPHENE, a novel platform that facilitates human-enhanced, multi-turn RAG conversation annotation for LLM evaluation.
Findings
Successfully used by 40 annotators to build thousands of conversations.
Enhances the quality of RAG benchmarks with human simulation.
Supports multi-turn, real-world conversation modeling.
Abstract
Retrieval Augmented Generation (RAG) is an important aspect of conversing with Large Language Models (LLMs) when factually correct information is important. LLMs may provide answers that appear correct, but could contain hallucinated information. Thus, building benchmarks that can evaluate LLMs on multi-turn RAG conversations has become an increasingly important task. Simulating real-world conversations is vital for producing high quality evaluation benchmarks. We present RAGAPHENE, a chat-based annotation platform that enables annotators to simulate real-world conversations for benchmarking and evaluating LLMs. RAGAPHENE has been successfully used by approximately 40 annotators to build thousands of real-world conversations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
