Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the   Question Answering Performance of the GPT LLM Family

Yiming Tan; Dehai Min; Yu Li; Wenbo Li; Nan Hu; Yongrui Chen; Guilin; Qi

arXiv:2303.07992·cs.CL·September 21, 2023·22 cites

Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family

Yiming Tan, Dehai Min, Yu Li, Wenbo Li, Nan Hu, Yongrui Chen, Guilin, Qi

PDF

Open Access 2 Repos

TL;DR

This study conducts a large-scale, comprehensive evaluation of ChatGPT and related LLMs on complex KBQA tasks across multiple datasets, revealing their strengths and limitations compared to traditional models.

Contribution

It introduces a black-box testing framework for large-scale evaluation of LLMs on complex KBQA questions, covering diverse datasets and multilingual scenarios.

Findings

01

ChatGPT performs well on simple questions but struggles with complex ones.

02

The evaluation highlights specific limitations of GPT models in complex KBQA tasks.

03

Multilingual datasets reveal language-specific challenges for LLMs.

Abstract

ChatGPT is a powerful large language model (LLM) that covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge. Therefore, there is growing interest in exploring whether ChatGPT can replace traditional knowledge-based question answering (KBQA) models. Although there have been some works analyzing the question answering performance of ChatGPT, there is still a lack of large-scale, comprehensive testing of various types of complex questions to analyze the limitations of the model. In this paper, we present a framework that follows the black-box testing specifications of CheckList proposed by Ribeiro et. al. We evaluate ChatGPT and its family of LLMs on eight real-world KB-based complex question answering datasets, which include six English datasets and two multilingual datasets. The total number of test cases is approximately…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education

MethodsMulti-Head Attention · Attention Is All You Need · Discriminative Fine-Tuning · GPT · Test · Flan-T5 · Weight Decay · Dropout · 15 Ways to Contact How can i speak to someone at Delta Airlines · Softmax