RankArena: A Unified Platform for Evaluating Retrieval, Reranking and RAG with Human and LLM Feedback

Abdelrahman Abdallah; Mahmoud Abdalla; Bhawna Piryani; Jamshid Mozafari; Mohammed Ali; Adam Jatowt

arXiv:2508.05512·cs.IR·August 8, 2025

RankArena: A Unified Platform for Evaluating Retrieval, Reranking and RAG with Human and LLM Feedback

Abdelrahman Abdallah, Mahmoud Abdalla, Bhawna Piryani, Jamshid Mozafari, Mohammed Ali, Adam Jatowt

PDF

TL;DR

RankArena is a comprehensive platform that enables multi-faceted evaluation of retrieval, reranking, and RAG systems using human and LLM feedback, facilitating better analysis and training of retrieval models.

Contribution

It introduces a unified, scalable platform supporting diverse evaluation modes and feedback collection for retrieval and RAG systems, integrating human and LLM judgments.

Findings

01

Supports multiple evaluation modes including visualisation and pairwise comparisons.

02

Captures detailed relevance feedback with auxiliary metadata.

03

Enables comparison between model rankings and human annotations.

Abstract

Evaluating the quality of retrieval-augmented generation (RAG) and document reranking systems remains challenging due to the lack of scalable, user-centric, and multi-perspective evaluation tools. We introduce RankArena, a unified platform for comparing and analysing the performance of retrieval pipelines, rerankers, and RAG systems using structured human and LLM-based feedback as well as for collecting such feedback. RankArena supports multiple evaluation modes: direct reranking visualisation, blind pairwise comparisons with human or LLM voting, supervised manual document annotation, and end-to-end RAG answer quality assessment. It captures fine-grained relevance feedback through both pairwise preferences and full-list annotations, along with auxiliary metadata such as movement metrics, annotation time, and quality ratings. The platform also integrates LLM-as-a-judge evaluation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.