Corruption Is Not All Bad: Incorporating Discourse Structure into Pre-training via Corruption for Essay Scoring
Farjana Sultana Mim, Naoya Inoue, Paul Reisert, Hiroki Ouchi and, Kentaro Inui

TL;DR
This paper introduces an unsupervised pre-training method that captures discourse structure in essays without relying on discourse parsers, significantly improving automated essay scoring accuracy.
Contribution
It proposes a novel corruption-based pre-training approach that encodes discourse coherence and cohesion without annotated data or parsers.
Findings
Achieves state-of-the-art results on essay organization scoring.
Effective in noisy student essays without discourse parser dependency.
Enhances masked language modeling with discourse-aware pre-training.
Abstract
Existing approaches for automated essay scoring and document representation learning typically rely on discourse parsers to incorporate discourse structure into text representation. However, the performance of parsers is not always adequate, especially when they are used on noisy texts, such as student essays. In this paper, we propose an unsupervised pre-training approach to capture discourse structure of essays in terms of coherence and cohesion that does not require any discourse parser or annotation. We introduce several types of token, sentence and paragraph-level corruption techniques for our proposed pre-training approach and augment masked language modeling pre-training with our pre-training method to leverage both contextualized and discourse information. Our proposed unsupervised approach achieves new state-of-the-art result on essay Organization scoring task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
