BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models

Chuyuan Li; Giuseppe Carenini

arXiv:2511.13095·cs.CL·January 27, 2026

BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models

Chuyuan Li, Giuseppe Carenini

PDF

Open Access 1 Video

TL;DR

BeDiscovER is a comprehensive benchmark suite for evaluating discourse understanding in modern language models, covering diverse tasks from discourse parsing to semantic phenomena, revealing strengths and weaknesses of current models.

Contribution

The paper introduces BeDiscovER, a new benchmark aggregating 52 datasets across discourse levels, including novel challenges like discourse particle disambiguation, for evaluating reasoning language models.

Findings

01

State-of-the-art models excel in temporal reasoning.

02

Models struggle with full document reasoning and subtle discourse phenomena.

03

GPT-5-mini shows strong arithmetic reasoning but limited discourse understanding.

Abstract

We introduce BeDiscovER (Benchmark of Discourse Understanding in the Era of Reasoning Language Models), an up-to-date, comprehensive suite for evaluating the discourse-level knowledge of modern LLMs. BeDiscovER compiles 5 publicly available discourse tasks across discourse lexicon, (multi-)sentential, and documental levels, with in total 52 individual datasets. It covers both extensively studied tasks such as discourse parsing and temporal relation extraction, as well as some novel challenges such as discourse particle disambiguation (e.g., ``just''), and also aggregates a shared task on Discourse Relation Parsing and Treebanking for multilingual and multi-framework discourse relation classification. We evaluate open-source LLMs: Qwen3 series, DeepSeek-R1, and frontier model such as GPT-5-mini on BeDiscovER, and find that state-of-the-art models exhibit strong performance in arithmetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications