ToolQA: A Dataset for LLM Question Answering with External Tools

Yuchen Zhuang; Yue Yu; Kuan Wang; Haotian Sun; Chao Zhang

arXiv:2306.13304·cs.CL·June 26, 2023·39 cites

ToolQA: A Dataset for LLM Question Answering with External Tools

Yuchen Zhuang, Yue Yu, Kuan Wang, Haotian Sun, Chao Zhang

PDF

Open Access 2 Repos 1 Video

TL;DR

ToolQA is a new dataset designed to evaluate large language models' ability to effectively use external tools for question answering, addressing limitations of previous evaluation methods.

Contribution

The paper introduces ToolQA, a scalable, automated dataset with specialized tools to accurately assess LLMs' external tool-use reasoning capabilities.

Findings

01

Sets a new benchmark for LLM tool-use evaluation

02

Highlights strengths and weaknesses of current tool-use LLMs

03

Provides insights for future improvements in LLMs

Abstract

Large Language Models (LLMs) have demonstrated impressive performance in various NLP tasks, but they still suffer from challenges such as hallucination and weak numerical reasoning. To overcome these challenges, external tools can be used to enhance LLMs' question-answering abilities. However, current evaluation methods do not distinguish between questions that can be answered using LLMs' internal knowledge and those that require external information through tool use. To address this issue, we introduce a new dataset called ToolQA, which is designed to faithfully evaluate LLMs' ability to use external tools for question answering. Our development of ToolQA involved a scalable, automated process for dataset curation, along with 13 specialized tools designed for interaction with external knowledge in order to answer questions. Importantly, we strive to minimize the overlap between our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

ToolQA: A Dataset for LLM Question Answering with External Tools· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research