Evaluation of RAG Metrics for Question Answering in the Telecom Domain
Sujoy Roychowdhury, Sumit Soman, H G Ranjani, Neeraj Gunda, Vansh, Chhabra, Sai Krishna Bala

TL;DR
This paper evaluates RAG metrics for question answering in the telecom domain, modifies an existing evaluation library, and analyzes the effectiveness and challenges of these metrics in specialized, real-world scenarios.
Contribution
It introduces a modified RAGAS evaluation package with additional metrics and provides an analysis of their performance and limitations in the telecom domain.
Findings
Some metrics correlate with correct retrievals.
Domain adaptation affects metric scores.
Challenges exist in applying these metrics in real-world telecom QA.
Abstract
Retrieval Augmented Generation (RAG) is widely used to enable Large Language Models (LLMs) perform Question Answering (QA) tasks in various domains. However, RAG based on open-source LLM for specialized domains has challenges of evaluating generated responses. A popular framework in the literature is the RAG Assessment (RAGAS), a publicly available library which uses LLMs for evaluation. One disadvantage of RAGAS is the lack of details of derivation of numerical value of the evaluation metrics. One of the outcomes of this work is a modified version of this package for few metrics (faithfulness, context relevance, answer relevance, answer correctness, answer similarity and factual correctness) through which we provide the intermediate outputs of the prompts by using any LLMs. Next, we analyse the expert evaluations of the output of the modified RAGAS package and observe the challenges of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Attention Dropout · Linear Warmup With Linear Decay · Residual Connection · Adam · Dropout · Byte Pair Encoding · Layer Normalization · Linear Layer
