Improving Lexical Difficulty Prediction with Context-Aligned Contrastive Learning and Ridge Ensembling
Wicaksono Leksono Muhamad, Joanito Agili Lopo, Tsamarah Rana Nugraha, Ahmad Cahyono Adi, Muhammad Oriza Nurfajri

TL;DR
This paper introduces a novel contrastive learning approach combined with ridge ensembling to improve cross-lingual lexical difficulty prediction, capturing ordinal structure and reducing bias.
Contribution
It proposes Context-Aligned Contrastive Regression integrating contrastive objectives with ensemble methods for better lexical difficulty modeling across languages.
Findings
Contrastive objectives enhance cross-lingual representation alignment.
Learned representations effectively capture ordinal difficulty structure.
Ensemble reduces systematic biases, improving stability across difficulty levels.
Abstract
Lexical difficulty prediction is a fundamental problem in language learning and readability assessment, requiring models to estimate word difficulty across different first-language (L1) backgrounds. However, existing approaches rely on regression-only training with scalar supervision, which does not explicitly structure the representation space, limiting their ability to capture cross-lingual alignment and ordinal difficulty. To mitigate these issues, we propose Context-Aligned Contrastive Regression, which integrates Ridge regression ensemble with two complementary objectives, i.e., Cross-View Context and Ordinal Soft Contrastive Learning. Experiments on three L1 datasets show that (i) contrastive objectives improve cross-lingual representation alignment while preserving language-specific nuances, (ii) the learned representations capture the ordinal structure of lexical difficulty, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
