Is Our Chatbot Telling Lies? Assessing Correctness of an LLM-based Dutch Support Chatbot
Herman Lassche (1, 2), Michiel Overeem (1), Ayushi Rastogi (2) ((1) AFAS Software, (2) University Groningen)

TL;DR
This paper develops a method to automatically assess the correctness of Dutch LLM-based support chatbot responses, aiming to identify misleading answers and improve customer service quality.
Contribution
It introduces a novel definition and metrics for correctness, and proposes improvements tailored to regional language and question types.
Findings
Automated system identifies wrong responses in 55% of cases.
Defines correctness based on support team decision-making.
Provides suggestions for enhancing correctness in regional language and question types.
Abstract
Companies support their customers using live chats and chatbots to gain their loyalty. AFAS is a Dutch company aiming to leverage the opportunity large language models (LLMs) offer to answer customer queries with minimal to no input from its customer support team. Adding to its complexity, it is unclear what makes a response correct, and that too in Dutch. Further, with minimal data available for training, the challenge is to identify whether an answer generated by a large language model is correct and do it on the fly. This study is the first to define the correctness of a response based on how the support team at AFAS makes decisions. It leverages literature on natural language generation and automated answer grading systems to automate the decision-making of the customer support team. We investigated questions requiring a binary response (e.g., Would it be possible to adjust tax…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · European and International Law Studies
