LAST at SemEval-2021 Task 1: Improving Multi-Word Complexity Prediction   Using Bigram Association Measures

Yves Bestgen

arXiv:2105.09653·cs.CL·May 21, 2021

LAST at SemEval-2021 Task 1: Improving Multi-Word Complexity Prediction Using Bigram Association Measures

Yves Bestgen

PDF

TL;DR

This paper presents LAST's system for SemEval-2021 that predicts lexical complexity using a LightGBM model with features from frequency lists, lexical norms, psychometric data, and bigram association measures, achieving decent multi-word task performance.

Contribution

The paper introduces a novel approach combining bigram association measures with traditional features for multi-word complexity prediction.

Findings

01

Bigram association measures provided limited improvement.

02

The system performed well on multi-word tasks.

03

Contextual features like sentence length had minimal impact.

Abstract

This paper describes the system developed by the Laboratoire d'analyse statistique des textes (LAST) for the Lexical Complexity Prediction shared task at SemEval-2021. The proposed system is made up of a LightGBM model fed with features obtained from many word frequency lists, published lexical norms and psychometric data. For tackling the specificity of the multi-word task, it uses bigram association measures. Despite that the only contextual feature used was sentence length, the system achieved an honorable performance in the multi-word task, but poorer in the single word task. The bigram association measures were found useful, but to a limited extent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.