Improving Lexical Difficulty Prediction with Context-Aligned Contrastive Learning and Ridge Ensembling

Wicaksono Leksono Muhamad; Joanito Agili Lopo; Tsamarah Rana Nugraha; Ahmad Cahyono Adi; Muhammad Oriza Nurfajri

arXiv:2605.08950·cs.CL·May 12, 2026

Improving Lexical Difficulty Prediction with Context-Aligned Contrastive Learning and Ridge Ensembling

Wicaksono Leksono Muhamad, Joanito Agili Lopo, Tsamarah Rana Nugraha, Ahmad Cahyono Adi, Muhammad Oriza Nurfajri

PDF

TL;DR

This paper introduces a novel contrastive learning approach combined with ridge ensembling to improve cross-lingual lexical difficulty prediction, capturing ordinal structure and reducing bias.

Contribution

It proposes Context-Aligned Contrastive Regression integrating contrastive objectives with ensemble methods for better lexical difficulty modeling across languages.

Findings

01

Contrastive objectives enhance cross-lingual representation alignment.

02

Learned representations effectively capture ordinal difficulty structure.

03

Ensemble reduces systematic biases, improving stability across difficulty levels.

Abstract

Lexical difficulty prediction is a fundamental problem in language learning and readability assessment, requiring models to estimate word difficulty across different first-language (L1) backgrounds. However, existing approaches rely on regression-only training with scalar supervision, which does not explicitly structure the representation space, limiting their ability to capture cross-lingual alignment and ordinal difficulty. To mitigate these issues, we propose Context-Aligned Contrastive Regression, which integrates Ridge regression ensemble with two complementary objectives, i.e., Cross-View Context and Ordinal Soft Contrastive Learning. Experiments on three L1 datasets show that (i) contrastive objectives improve cross-lingual representation alignment while preserving language-specific nuances, (ii) the learned representations capture the ordinal structure of lexical difficulty, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.