Detecting Multiword Expression Type Helps Lexical Complexity Assessment

Ekaterina Kochmar; Sian Gooding; and Matthew Shardlow

arXiv:2005.05692·cs.CL·May 13, 2020·1 cites

Detecting Multiword Expression Type Helps Lexical Complexity Assessment

Ekaterina Kochmar, Sian Gooding, and Matthew Shardlow

PDF

Open Access 1 Repo

TL;DR

This paper enhances lexical complexity assessment by re-annotating a dataset with MWE types, revealing that MWE type information improves the accuracy of complexity prediction for both native and non-native readers.

Contribution

It introduces a new MWE-annotated dataset for lexical complexity, and demonstrates that MWE type information improves complexity assessment systems.

Findings

01

MWE type annotation benefits complexity prediction accuracy

02

Certain MWE types are more problematic for non-native readers

03

The dataset is a valuable resource for text simplification research

Abstract

Multiword expressions (MWEs) represent lexemes that should be treated as single lexical units due to their idiosyncratic nature. Multiple NLP applications have been shown to benefit from MWE identification, however the research on lexical complexity of MWEs is still an under-explored area. In this work, we re-annotate the Complex Word Identification Shared Task 2018 dataset of Yimam et al. (2017), which provides complexity scores for a range of lexemes, with the types of MWEs. We release the MWE-annotated dataset with this paper, and we believe this dataset represents a valuable resource for the text simplification community. In addition, we investigate which types of expressions are most problematic for native and non-native readers. Finally, we show that a lexical complexity assessment system benefits from the information about MWE types.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ekochmar/MWE-CWI
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling