Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues
Norbert Braunschweiler, Rama Doddipatla, Simon Keizer and, Svetlana Stoyanchev

TL;DR
This study evaluates large language models like ChatGPT for generating document-grounded responses in information-seeking dialogues, highlighting their strengths and limitations through human assessments.
Contribution
It introduces a comparison of ChatGPT-based methods for document-grounded response generation and assesses their performance with human evaluation in social service domains.
Findings
ChatGPT variants often include hallucinated information.
Human evaluators rated ChatGPT responses higher than other systems.
Automatic metrics are inadequate for evaluating verbose, document-grounded responses.
Abstract
In this paper, we investigate the use of large language models (LLMs) like ChatGPT for document-grounded response generation in the context of information-seeking dialogues. For evaluation, we use the MultiDoc2Dial corpus of task-oriented dialogues in four social service domains previously used in the DialDoc 2022 Shared Task. Information-seeking dialogue turns are grounded in multiple documents providing relevant information. We generate dialogue completion responses by prompting a ChatGPT model, using two methods: Chat-Completion and LlamaIndex. ChatCompletion uses knowledge from ChatGPT model pretraining while LlamaIndex also extracts relevant information from documents. Observing that document-grounded response generation via LLMs cannot be adequately assessed by automatic evaluation metrics as they are significantly more verbose, we perform a human evaluation where annotators rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Mental Health via Writing
Methodstravel james
