KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text   Understanding

Zijun Yao; Yantao Liu; Xin Lv; Shulin Cao; Jifan Yu; Lei Hou; Juanzi; Li

arXiv:2307.03115·cs.CL·July 7, 2023

KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding

Zijun Yao, Yantao Liu, Xin Lv, Shulin Cao, Jifan Yu, Lei Hou, Juanzi, Li

PDF

Open Access 1 Repo

TL;DR

KoRC is a new benchmark for deep text understanding that emphasizes broad knowledge coverage and flexible answers, revealing that current models still struggle with this challenging task.

Contribution

The paper introduces KoRC, a knowledge-oriented reading comprehension benchmark with extensive knowledge coverage and answer formats beyond spans or choices.

Findings

01

State-of-the-art models achieve only 68.3% in-distribution accuracy.

02

Models achieve only 30.0% F1 on out-of-distribution data.

03

Deep text understanding remains a significant challenge.

Abstract

Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations. On the one hand, most of them require human annotation of knowledge, which leads to limited knowledge coverage. On the other hand, they usually use choices or spans in the texts as the answers, which results in narrow answer space. To overcome these limitations, we build a new challenging benchmark named KoRc in this paper. Compared with previous benchmarks, KoRC has two advantages, i.e., broad knowledge coverage and flexible answer format. Specifically, we utilize massive knowledge bases to guide annotators or large language models (LLMs) to construct knowledgable questions. Moreover, we use labels in knowledge bases rather than spans or choices as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-keg/korc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification