What's Hard in English RST Parsing? Predictive Models for Error Analysis

Yang Janet Liu; Tatsuya Aoyama; Amir Zeldes

arXiv:2309.04940·cs.CL·September 12, 2023

What's Hard in English RST Parsing? Predictive Models for Error Analysis

Yang Janet Liu, Tatsuya Aoyama, Amir Zeldes

PDF

Open Access 1 Repo

TL;DR

This paper investigates the main challenges in English RST discourse parsing, highlighting long-distance relations as the primary difficulty and introducing models to predict parsing errors with over 76% accuracy.

Contribution

It identifies key factors affecting RST parsing performance and provides annotated test sets and predictive models to analyze error sources.

Findings

01

Long-distance dependencies are the main challenge in RST parsing.

02

Explicit/implicit relation distinction influences parsing difficulty.

03

Predictive models achieve over 76% accuracy in error prediction.

Abstract

Despite recent advances in Natural Language Processing (NLP), hierarchical discourse parsing in the framework of Rhetorical Structure Theory remains challenging, and our understanding of the reasons for this are as yet limited. In this paper, we examine and model some of the factors associated with parsing difficulties in previous work: the existence of implicit discourse relations, challenges in identifying long-distance relations, out-of-vocabulary items, and more. In order to assess the relative importance of these variables, we also release two annotated English test-sets with explicit correct and distracting discourse markers associated with gold standard RST relations. Our results show that as in shallow discourse parsing, the explicit/implicit distinction plays a role, but that long-distance dependencies are the main challenge, while lack of lexical overlap is less of a problem,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

janetlauyeung/nlperrors4rst
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification