Deepchecks: Evaluating Retrieval-Augmented Generation (RAG)

Assaf Gerner; Netta Madvil; Nadav Barak; Alex Zaikman; Jonatan Liberman; Liron Hamra; Rotem Brazilay; Shay Tsadok; Yaron Friedman; Neal Harow; Noam Bresler; Shir Chorev; Philip Tannor; and Lior Rokach

arXiv:2605.14488·cs.AI·May 15, 2026

Deepchecks: Evaluating Retrieval-Augmented Generation (RAG)

Assaf Gerner, Netta Madvil, Nadav Barak, Alex Zaikman, Jonatan Liberman, Liron Hamra, Rotem Brazilay, Shay Tsadok, Yaron Friedman, Neal Harow, Noam Bresler, Shir Chorev, Philip Tannor, and Lior Rokach

PDF

TL;DR

Deepchecks is a comprehensive evaluation framework designed specifically for Retrieval-Augmented Generation systems, addressing their unique challenges in assessing reliability, relevance, and user satisfaction.

Contribution

It introduces a multi-faceted evaluation approach, including root cause analysis and production monitoring, tailored for RAG applications.

Findings

01

Provides a robust foundation for RAG system assessment

02

Addresses evaluation challenges due to stochastic outputs

03

Ensures alignment with application-specific requirements

Abstract

Large Language Models (LLMs) augmented with Retrieval-Augmented Generation (RAG) techniques are revolutionizing applications across multiple domains, such as healthcare, finance, and customer service. Despite their potential, evaluating RAG systems remains a complex challenge due to the stochastic nature of generated outputs and the intricate interplay between retrieval and generation components. This paper introduces Deepchecks, a comprehensive framework tailored for evaluating RAG applications. Deepchecks' evaluation framework addresses RAG applications evaluation through a multi-faceted approach, root cause analysis and production monitoring. By ensuring alignment with application-specific requirements, Deepchecks framework provides a robust foundation for assessing reliability, relevance, and user satisfaction in RAG systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.