Evaluating the Performance of RAG Methods for Conversational AI in the Airport Domain

Yuyang Li; Philip J.M. Kerbusch; Raimon H.R. Pruim; Tobias K\"afer

arXiv:2505.13006·cs.CL·May 20, 2025

Evaluating the Performance of RAG Methods for Conversational AI in the Airport Domain

Yuyang Li, Philip J.M. Kerbusch, Raimon H.R. Pruim, Tobias K\"afer

PDF

Open Access 1 Video

TL;DR

This paper evaluates three Retrieval-Augmented Generation methods for conversational AI in airports, highlighting Graph RAG's superior accuracy and safety features for dynamic, reasoning-involving queries.

Contribution

It introduces and compares three RAG-based approaches for airport-specific conversational AI, emphasizing Graph RAG's effectiveness and safety improvements.

Findings

01

Graph RAG achieved 91.49% accuracy with fewer hallucinations.

02

Traditional RAG had 84.84% accuracy but risked hallucinations.

03

SQL RAG and Graph RAG are recommended for airport environments.

Abstract

Airports from the top 20 in terms of annual passengers are highly dynamic environments with thousands of flights daily, and they aim to increase the degree of automation. To contribute to this, we implemented a Conversational AI system that enables staff in an airport to communicate with flight information systems. This system not only answers standard airport queries but also resolves airport terminology, jargon, abbreviations, and dynamic questions involving reasoning. In this paper, we built three different Retrieval-Augmented Generation (RAG) methods, including traditional RAG, SQL RAG, and Knowledge Graph-based RAG (Graph RAG). Experiments showed that traditional RAG achieved 84.84% accuracy using BM25 + GPT-4 but occasionally produced hallucinations, which is risky to airport safety. In contrast, SQL RAG and Graph RAG achieved 80.85% and 91.49% accuracy respectively, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Evaluating the Performance of RAG Methods for Conversational AI in the Airport Domain· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Label Smoothing · Linear Warmup With Linear Decay · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Attention Dropout · WordPiece