Quality Assurance for LLM-RAG Systems: Empirical Insights from Tourism Application Testing
Bestoun S. Ahmed, Ludwig Otto Baader, Firas Bayram, Siri Jagstedt,, Peter Magnusson

TL;DR
This paper develops a comprehensive testing framework for LLM-RAG systems in tourism, evaluating multiple models and configurations to assess quality and performance, with practical insights for deployment.
Contribution
It introduces a systematic empirical testing methodology with 17 metrics for LLM-RAG systems, analyzing the impact of architectural choices and parameters.
Findings
Newer LLM versions show modest performance improvements.
Response length and complexity are more affected than semantic quality.
Temperature and top-p parameters significantly influence response quality.
Abstract
This paper presents a comprehensive framework for testing and evaluating quality characteristics of Large Language Model (LLM) systems enhanced with Retrieval-Augmented Generation (RAG) in tourism applications. Through systematic empirical evaluation of three different LLM variants across multiple parameter configurations, we demonstrate the effectiveness of our testing methodology in assessing both functional correctness and extra-functional properties. Our framework implements 17 distinct metrics that encompass syntactic analysis, semantic evaluation, and behavioral evaluation through LLM judges. The study reveals significant information about how different architectural choices and parameter configurations affect system performance, particularly highlighting the impact of temperature and top-p parameters on response quality. The tests were carried out on a tourism recommendation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis · Multi-Agent Systems and Negotiation
