Benchmarking Large Language Models for Conversational Question Answering   in Multi-instructional Documents

Shiwei Wu; Chen Zhang; Yan Gao; Qimeng Wang; Tong Xu; Yao Hu; Enhong; Chen

arXiv:2410.00526·cs.CL·October 2, 2024

Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents

Shiwei Wu, Chen Zhang, Yan Gao, Qimeng Wang, Tong Xu, Yao Hu, Enhong, Chen

PDF

Open Access

TL;DR

This paper introduces InsCoQA, a new benchmark for evaluating large language models on conversational question answering within instructional documents, addressing the limitations of existing benchmarks by focusing on complex, multi-step guidance tasks.

Contribution

The paper presents InsCoQA, a novel benchmark for assessing LLMs on instructional documents, and proposes InsEval, an LLM-assisted evaluator for response accuracy and integrity.

Findings

01

InsCoQA effectively evaluates LLMs on complex instructional tasks.

02

InsEval provides reliable assessment of response accuracy.

03

Large language models show varied performance on the benchmark.

Abstract

Instructional documents are rich sources of knowledge for completing various tasks, yet their unique challenges in conversational question answering (CQA) have not been thoroughly explored. Existing benchmarks have primarily focused on basic factual question-answering from single narrative documents, making them inadequate for assessing a model`s ability to comprehend complex real-world instructional documents and provide accurate step-by-step guidance in daily life. To bridge this gap, we present InsCoQA, a novel benchmark tailored for evaluating large language models (LLMs) in the context of CQA with instructional documents. Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents, reflecting the intricate and multi-faceted nature of real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning