The Change that Matters in Discourse Parsing: Estimating the Impact of   Domain Shift on Parser Error

Katherine Atwell; Anthony Sicilia; Seong Jae Hwang; Malihe Alikhani

arXiv:2203.11317·cs.CL·March 23, 2022

The Change that Matters in Discourse Parsing: Estimating the Impact of Domain Shift on Parser Error

Katherine Atwell, Anthony Sicilia, Seong Jae Hwang, Malihe Alikhani

PDF

Open Access 4 Repos

TL;DR

This paper investigates how domain shifts affect discourse parser errors and proposes a statistical measure from domain adaptation theory to better estimate model generalization across different text domains.

Contribution

It introduces a new statistic for estimating error-gap related to domain shift in discourse parsing and evaluates its effectiveness through extensive empirical studies.

Findings

01

Non-news datasets transfer more easily than news datasets.

02

The proposed statistic correlates with actual error-gap, aiding domain adaptation.

03

Insights into dataset properties that influence discourse model performance.

Abstract

Discourse analysis allows us to attain inferences of a text document that extend beyond the sentence-level. The current performance of discourse models is very low on texts outside of the training distribution's coverage, diminishing the practical utility of existing models. There is need for a measure that can inform us to what extent our model generalizes from the training to the test sample when these samples may be drawn from distinct distributions. While this can be estimated via distribution shift, we argue that this does not directly correlate with change in the observed error of a classifier (i.e. error-gap). Thus, we propose to use a statistic from the theoretical domain adaptation literature which can be directly tied to error-gap. We study the bias of this statistic as an estimator of error-gap both theoretically and through a large-scale empirical study of over 2400…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification