GRAMMAR: Grounded and Modular Methodology for Assessment of   Closed-Domain Retrieval-Augmented Language Model

Xinzhe Li; Ming Liu; Shang Gao

arXiv:2404.19232·cs.CL·October 24, 2024

GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model

Xinzhe Li, Ming Liu, Shang Gao

PDF

Open Access 1 Repo

TL;DR

GRAMMAR is a comprehensive evaluation framework designed to diagnose and assess the performance of retrieval-augmented language models in closed-domain settings, addressing challenges of data privacy and module failure analysis.

Contribution

The paper introduces GRAMMAR, a novel modular evaluation methodology with a grounded data generation process for diagnosing RAG system failures in closed domains.

Findings

01

Effectively identifies vulnerable modules in RAG systems.

02

Supports hypothesis testing for text vulnerability analysis.

03

Provides a reliable, open-source evaluation tool.

Abstract

Retrieval-Augmented Generation (RAG) systems are widely used across various industries for querying closed-domain and in-house knowledge bases. However, evaluating these systems presents significant challenges due to the private nature of closed-domain data and a scarcity of queries with verifiable ground truths. Moreover, there is a lack of analytical methods to diagnose problematic modules and identify types of failure, such as those caused by knowledge deficits or issues with robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising a grounded data generation process and an evaluation protocol that effectively pinpoints defective modules. Our validation experiments reveal that GRAMMAR provides a reliable approach for identifying vulnerable modules and supports hypothesis testing for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xinzhel/grammar
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling