Automatic Identification of AltLexes using Monolingual Parallel Corpora
Elnaz Davoodi, Leila Kosseim

TL;DR
This paper presents a novel method leveraging parallel corpora and lexical resources to automatically identify AltLexes, enhancing discourse relation detection beyond traditional connectives.
Contribution
It introduces a new approach that uses monolingual parallel corpora and lexical databases to discover AltLexes signaling discourse relations.
Findings
Discovered 91 AltLexes automatically
Improved detection of discourse relations outside standard connectives
Applied method to Simple Wikipedia and Newsela corpora
Abstract
The automatic identification of discourse relations is still a challenging task in natural language processing. Discourse connectives, such as "since" or "but", are the most informative cues to identify explicit relations; however discourse parsers typically use a closed inventory of such connectives. As a result, discourse relations signaled by markers outside these inventories (i.e. AltLexes) are not detected as effectively. In this paper, we propose a novel method to leverage parallel corpora in text simplification and lexical resources to automatically identify alternative lexicalizations that signal discourse relation. When applied to the Simple Wikipedia and Newsela corpora along with WordNet and the PPDB, the method allowed the automatic discovery of 91 AltLexes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
