Discovering Elementary Discourse Units in Textual Data Using Canonical Correlation Analysis

Akanksha Mehndiratta; Krishna Asawa

arXiv:2406.12997·cs.CL·May 30, 2025

Discovering Elementary Discourse Units in Textual Data Using Canonical Correlation Analysis

Akanksha Mehndiratta, Krishna Asawa

PDF

Open Access

TL;DR

This paper introduces an unsupervised, linear, and language-independent model using Canonical Correlation Analysis to identify Elementary Discourse Units in text, demonstrating competitive performance in textual similarity tasks.

Contribution

It proposes a novel unsupervised EDU segmentation method based on CCA, with a strong theoretical foundation and practical effectiveness in content selection tasks.

Findings

01

EDUs deliver competitive results in textual similarity tasks

02

The model outperforms some supervised techniques despite simplicity

03

The approach is adaptable and language independent

Abstract

Canonical Correlation Analysis (CCA) has been exploited immensely for learning latent representations in various fields. This study takes a step further by demonstrating the potential of CCA in identifying Elementary Discourse Units(EDUs) that captures the latent information within the textual data. The probabilistic interpretation of CCA discussed in this study utilizes the two-view nature of textual data, i.e. the consecutive sentences in a document or turns in a dyadic conversation, and has a strong theoretical foundation. Furthermore, this study proposes a model for Elementary Discourse Unit(EDU) segmentation that discovers EDUs in textual data without any supervision. To validate the model, the EDUs are utilized as textual unit for content selection in textual similarity task. Empirical results on Semantic Textual Similarity(STSB) and Mohler datasets confirm that, despite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies