ESURF: Simple and Effective EDU Segmentation

Mohammadreza Sediqin; Shlomo Engelson Argamon

arXiv:2501.07723·cs.CL·January 15, 2025

ESURF: Simple and Effective EDU Segmentation

Mohammadreza Sediqin, Shlomo Engelson Argamon

PDF

Open Access

TL;DR

This paper introduces a simple, effective method for EDU segmentation using lexical and character n-gram features with random forest classification, outperforming existing methods and enhancing discourse parsing efficiency.

Contribution

The paper presents a novel, straightforward approach for EDU segmentation that leverages lexical and character n-grams, demonstrating superior performance over previous techniques.

Findings

01

Outperforms existing segmentation methods

02

Enhances discourse parser accuracy

03

Highlights importance of lexical features

Abstract

Segmenting text into Elemental Discourse Units (EDUs) is a fundamental task in discourse parsing. We present a new simple method for identifying EDU boundaries, and hence segmenting them, based on lexical and character n-gram features, using random forest classification. We show that the method, despite its simplicity, outperforms other methods both for segmentation and within a state of the art discourse parser. This indicates the importance of such features for identifying basic discourse elements, pointing towards potentially more training-efficient methods for discourse analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning