LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for   Lexical Complexity Prediction

Abhinandan Desai; Kai North; Marcos Zampieri; Christopher M.; Homan

arXiv:2105.08780·cs.CL·May 20, 2021

LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction

Abhinandan Desai, Kai North, Marcos Zampieri, Christopher M., Homan

PDF

TL;DR

This paper presents a logistic regression-based system utilizing diverse linguistic features to predict lexical complexity in context, evaluated on the SemEval-2021 dataset with various performance metrics.

Contribution

It introduces a comprehensive feature set and analyzes their impact on lexical complexity prediction, advancing methods for computational linguistics tasks.

Findings

01

Linguistic features significantly improve prediction accuracy.

02

Psycholinguistic features are highly influential.

03

The system achieves competitive correlation scores.

Abstract

This paper describes team LCP-RIT's submission to the SemEval-2021 Task 1: Lexical Complexity Prediction (LCP). The task organizers provided participants with an augmented version of CompLex (Shardlow et al., 2020), an English multi-domain dataset in which words in context were annotated with respect to their complexity using a five point Likert scale. Our system uses logistic regression and a wide range of linguistic features (e.g. psycholinguistic features, n-grams, word frequency, POS tags) to predict the complexity of single words in this dataset. We analyze the impact of different linguistic features in the classification performance and we evaluate the results in terms of mean absolute error, mean squared error, Pearson correlation, and Spearman correlation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLogistic Regression