Automatic Question-Answer Generation for Long-Tail Knowledge

Rohan Kumar; Youngmin Kim; Sunitha Ravi; Haitian Sun; Christos; Faloutsos; Ruslan Salakhutdinov; Minji Yoon

arXiv:2403.01382·cs.CL·March 5, 2024·1 cites

Automatic Question-Answer Generation for Long-Tail Knowledge

Rohan Kumar, Youngmin Kim, Sunitha Ravi, Haitian Sun, Christos, Faloutsos, Ruslan Salakhutdinov, Minji Yoon

PDF

Open Access

TL;DR

This paper introduces an automatic method to generate QA datasets focused on long-tail knowledge entities, enabling better evaluation of LLMs' performance on uncommon topics, and compares their effectiveness with external knowledge sources.

Contribution

The paper presents a novel automated approach for creating long-tail QA datasets and explores how external knowledge graphs impact LLM performance on these datasets.

Findings

01

LLMs perform better on long-tail questions with external knowledge.

02

Automated dataset generation reduces manual effort significantly.

03

External resources like Wikipedia improve LLM accuracy on tail entities.

Abstract

Pretrained Large Language Models (LLMs) have gained significant attention for addressing open-domain Question Answering (QA). While they exhibit high accuracy in answering questions related to common knowledge, LLMs encounter difficulties in learning about uncommon long-tail knowledge (tail entities). Since manually constructing QA datasets demands substantial human resources, the types of existing QA datasets are limited, leaving us with a scarcity of datasets to study the performance of LLMs on tail entities. In this paper, we propose an automatic approach to generate specialized QA datasets for tail entities and present the associated research challenges. We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets, comparing their performance with and without external resources including Wikipedia and Wikidata knowledge graphs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems