LAST at SemEval-2021 Task 1: Improving Multi-Word Complexity Prediction Using Bigram Association Measures
Yves Bestgen

TL;DR
This paper presents LAST's system for SemEval-2021 that predicts lexical complexity using a LightGBM model with features from frequency lists, lexical norms, psychometric data, and bigram association measures, achieving decent multi-word task performance.
Contribution
The paper introduces a novel approach combining bigram association measures with traditional features for multi-word complexity prediction.
Findings
Bigram association measures provided limited improvement.
The system performed well on multi-word tasks.
Contextual features like sentence length had minimal impact.
Abstract
This paper describes the system developed by the Laboratoire d'analyse statistique des textes (LAST) for the Lexical Complexity Prediction shared task at SemEval-2021. The proposed system is made up of a LightGBM model fed with features obtained from many word frequency lists, published lexical norms and psychometric data. For tackling the specificity of the multi-word task, it uses bigram association measures. Despite that the only contextual feature used was sentence length, the system achieved an honorable performance in the multi-word task, but poorer in the single word task. The bigram association measures were found useful, but to a limited extent.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
