Extending Automatic Discourse Segmentation for Texts in Spanish to Catalan
Iria da Cunha, Eric SanJuan, Juan-Manuel Torres-Moreno, Irene, Castell\'on

TL;DR
This paper introduces the first discourse segmenter for Catalan, adapting RST-based methods from Spanish and demonstrating promising results on manually segmented texts.
Contribution
It develops the initial discourse segmenter for Catalan by adapting existing Spanish RST-based rules, filling a gap in NLP tools for Catalan.
Findings
System achieves promising results on gold standard corpus
Adapts Spanish discourse rules to Catalan using lexical and syntactic info
First discourse segmenter developed for Catalan texts
Abstract
At present, automatic discourse analysis is a relevant research topic in the field of NLP. However, discourse is one of the phenomena most difficult to process. Although discourse parsers have been already developed for several languages, this tool does not exist for Catalan. In order to implement this kind of parser, the first step is to develop a discourse segmenter. In this article we present the first discourse segmenter for texts in Catalan. This segmenter is based on Rhetorical Structure Theory (RST) for Spanish, and uses lexical and syntactic information to translate rules valid for Spanish into rules for Catalan. We have evaluated the system by using a gold standard corpus including manually segmented texts and results are promising.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
