Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

Tao Wu; Adam Kapelner

arXiv:2602.18326·cs.CL·February 23, 2026

Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

Tao Wu, Adam Kapelner

PDF

Open Access

TL;DR

This paper presents a deep learning system that automatically identifies highly informative contexts for vocabulary learning, comparing different modeling approaches and introducing a new metric to evaluate their effectiveness.

Contribution

It introduces a supervised fine-tuning approach with Qwen3 embeddings and handcrafted features, achieving significant improvements in selecting quality contexts for vocabulary instruction.

Findings

01

Model (iii) achieves a good-to-bad ratio of 440 while discarding only 70% of good contexts.

02

Supervised fine-tuning with Qwen3 embeddings outperforms unsupervised similarity-based methods.

03

The Retention Competency Curve effectively visualizes trade-offs in context selection performance.

Abstract

We describe a modern deep learning system that automatically identifies informative contextual examples (\qu{contexts}) for first language vocabulary instruction for high school student. Our paper compares three modeling approaches: (i) an unsupervised similarity-based strategy using MPNet's uniformly contextualized embeddings, (ii) a supervised framework built on instruction-aware, fine-tuned Qwen3 embeddings with a nonlinear regression head and (iii) model (ii) plus handcrafted context features. We introduce a novel metric called the Retention Competency Curve to visualize trade-offs between the discarded proportion of good contexts and the \qu{good-to-bad} contexts ratio providing a compact, unified lens on model performance. Model (iii) delivers the most dramatic gains with performance of a good-to-bad ratio of 440 all while only throwing out 70\% of the good contexts. In summary,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling