GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model
Xinzhe Li, Ming Liu, Shang Gao

TL;DR
GRAMMAR is a comprehensive evaluation framework designed to diagnose and assess the performance of retrieval-augmented language models in closed-domain settings, addressing challenges of data privacy and module failure analysis.
Contribution
The paper introduces GRAMMAR, a novel modular evaluation methodology with a grounded data generation process for diagnosing RAG system failures in closed domains.
Findings
Effectively identifies vulnerable modules in RAG systems.
Supports hypothesis testing for text vulnerability analysis.
Provides a reliable, open-source evaluation tool.
Abstract
Retrieval-Augmented Generation (RAG) systems are widely used across various industries for querying closed-domain and in-house knowledge bases. However, evaluating these systems presents significant challenges due to the private nature of closed-domain data and a scarcity of queries with verifiable ground truths. Moreover, there is a lack of analytical methods to diagnose problematic modules and identify types of failure, such as those caused by knowledge deficits or issues with robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising a grounded data generation process and an evaluation protocol that effectively pinpoints defective modules. Our validation experiments reveal that GRAMMAR provides a reliable approach for identifying vulnerable modules and supports hypothesis testing for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
