MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable   Distant Sentiment Supervision

Patrick Huber; Giuseppe Carenini

arXiv:2011.03017·cs.CL·November 6, 2020

MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision

Patrick Huber, Giuseppe Carenini

PDF

1 Repo

TL;DR

This paper introduces MEGA-DT, a large-scale discourse treebank generated via distant supervision from sentiment data, enabling improved discourse parsing across domains.

Contribution

It presents a scalable, heuristic-based method to automatically create discourse treebanks with structure and nuclearity, expanding resources for RST discourse parsing.

Findings

01

Parser trained on MEGA-DT outperforms others in cross-domain tests.

02

Generated discourse trees include structure and nuclearity information.

03

Method enables large-scale discourse annotation without manual effort.

Abstract

The lack of large and diverse discourse treebanks hinders the application of data-driven approaches, such as deep-learning, to RST-style discourse parsing. In this work, we present a novel scalable methodology to automatically generate discourse treebanks using distant supervision from sentiment-annotated datasets, creating and publishing MEGA-DT, a new large-scale discourse-annotated corpus. Our approach generates discourse trees incorporating structure and nuclearity for documents of arbitrary length by relying on an efficient heuristic beam-search strategy, extended with a stochastic component. Experiments on multiple datasets indicate that a discourse parser trained on our MEGA-DT treebank delivers promising inter-domain performance gains when compared to parsers trained on human-annotated discourse corpora.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nlpat/MEGA-DT
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.