Evaluating Large Language Models for Document-grounded Response   Generation in Information-Seeking Dialogues

Norbert Braunschweiler; Rama Doddipatla; Simon Keizer and; Svetlana Stoyanchev

arXiv:2309.11838·cs.CL·September 22, 2023·2 cites

Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues

Norbert Braunschweiler, Rama Doddipatla, Simon Keizer and, Svetlana Stoyanchev

PDF

Open Access

TL;DR

This study evaluates large language models like ChatGPT for generating document-grounded responses in information-seeking dialogues, highlighting their strengths and limitations through human assessments.

Contribution

It introduces a comparison of ChatGPT-based methods for document-grounded response generation and assesses their performance with human evaluation in social service domains.

Findings

01

ChatGPT variants often include hallucinated information.

02

Human evaluators rated ChatGPT responses higher than other systems.

03

Automatic metrics are inadequate for evaluating verbose, document-grounded responses.

Abstract

In this paper, we investigate the use of large language models (LLMs) like ChatGPT for document-grounded response generation in the context of information-seeking dialogues. For evaluation, we use the MultiDoc2Dial corpus of task-oriented dialogues in four social service domains previously used in the DialDoc 2022 Shared Task. Information-seeking dialogue turns are grounded in multiple documents providing relevant information. We generate dialogue completion responses by prompting a ChatGPT model, using two methods: Chat-Completion and LlamaIndex. ChatCompletion uses knowledge from ChatGPT model pretraining while LlamaIndex also extracts relevant information from documents. Observing that document-grounded response generation via LLMs cannot be adequately assessed by automatic evaluation metrics as they are significantly more verbose, we perform a human evaluation where annotators rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Mental Health via Writing

Methodstravel james