KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding
Zijun Yao, Yantao Liu, Xin Lv, Shulin Cao, Jifan Yu, Lei Hou, Juanzi, Li

TL;DR
KoRC is a new benchmark for deep text understanding that emphasizes broad knowledge coverage and flexible answers, revealing that current models still struggle with this challenging task.
Contribution
The paper introduces KoRC, a knowledge-oriented reading comprehension benchmark with extensive knowledge coverage and answer formats beyond spans or choices.
Findings
State-of-the-art models achieve only 68.3% in-distribution accuracy.
Models achieve only 30.0% F1 on out-of-distribution data.
Deep text understanding remains a significant challenge.
Abstract
Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations. On the one hand, most of them require human annotation of knowledge, which leads to limited knowledge coverage. On the other hand, they usually use choices or spans in the texts as the answers, which results in narrow answer space. To overcome these limitations, we build a new challenging benchmark named KoRc in this paper. Compared with previous benchmarks, KoRC has two advantages, i.e., broad knowledge coverage and flexible answer format. Specifically, we utilize massive knowledge bases to guide annotators or large language models (LLMs) to construct knowledgable questions. Moreover, we use labels in knowledge bases rather than spans or choices as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
