TL;DR
This paper presents a novel bilingual RST discourse parser trained on a parallel Russian-English corpus, achieving state-of-the-art results and demonstrating effective cross-lingual transfer in discourse parsing.
Contribution
Introduces the first manually annotated parallel Russian-English RST corpus and develops an end-to-end parser that excels in monolingual and bilingual settings.
Findings
Achieves state-of-the-art results on English and Russian RST corpora.
Demonstrates effective cross-lingual transfer with limited second-language data.
First evaluation of cross-lingual RST parsing on a parallel corpus.
Abstract
Discourse parsing is a crucial task in natural language processing that aims to reveal the higher-level relations in a text. Despite growing interest in cross-lingual discourse parsing, challenges persist due to limited parallel data and inconsistencies in the Rhetorical Structure Theory (RST) application across languages and corpora. To address this, we introduce a parallel Russian annotation for the large and diverse English GUM RST corpus. Leveraging recent advances, our end-to-end RST parser achieves state-of-the-art results on both English and Russian corpora. It demonstrates effectiveness in both monolingual and bilingual settings, successfully transferring even with limited second-language annotation. To the best of our knowledge, this work is the first to evaluate the potential of cross-lingual end-to-end RST parsing on a manually annotated parallel corpus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
