Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning
Tao Wu, Adam Kapelner

TL;DR
This paper presents a deep learning system that automatically identifies highly informative contexts for vocabulary learning, comparing different modeling approaches and introducing a new metric to evaluate their effectiveness.
Contribution
It introduces a supervised fine-tuning approach with Qwen3 embeddings and handcrafted features, achieving significant improvements in selecting quality contexts for vocabulary instruction.
Findings
Model (iii) achieves a good-to-bad ratio of 440 while discarding only 70% of good contexts.
Supervised fine-tuning with Qwen3 embeddings outperforms unsupervised similarity-based methods.
The Retention Competency Curve effectively visualizes trade-offs in context selection performance.
Abstract
We describe a modern deep learning system that automatically identifies informative contextual examples (\qu{contexts}) for first language vocabulary instruction for high school student. Our paper compares three modeling approaches: (i) an unsupervised similarity-based strategy using MPNet's uniformly contextualized embeddings, (ii) a supervised framework built on instruction-aware, fine-tuned Qwen3 embeddings with a nonlinear regression head and (iii) model (ii) plus handcrafted context features. We introduce a novel metric called the Retention Competency Curve to visualize trade-offs between the discarded proportion of good contexts and the \qu{good-to-bad} contexts ratio providing a compact, unified lens on model performance. Model (iii) delivers the most dramatic gains with performance of a good-to-bad ratio of 440 all while only throwing out 70\% of the good contexts. In summary,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling
