From Data to Dialogue: Unlocking Language for All
Dakota Ellis, Samy Bakikerali, Wanshan Chen, Bao Dinh, Uyen Le

TL;DR
This paper develops a specialized word list for language learners that outperforms industry standards in coverage with fewer words, enabling scalable and automated language learning tools.
Contribution
It introduces a method to create a specialized word list tailored to specific subsets, improving efficiency over existing general service lists.
Findings
SWL outperforms NGSL in coverage with fewer words
Automated, scalable process for creating SWL
Objective criteria enable customization for learners
Abstract
Traditional linguists have proposed the use of a General Service List (GSL) to assist new language learners in identifying the most important words in English. This process requires linguistic expertise, subjective input, and a considerable amount of time. We attempt to create our own GSL and evaluate its practicality against the industry standard (The NGSL). We found creating a Specialized Word List (SWL), or a word list specific to a subset of the overall corpus, to be the most practical way for language-learners to optimize the process. The SWL's that we created using our model outperformed the industry standard, reaching the 95% coverage required for language comprehension with fewer words comparatively. By restricting the SWL process to objective criteria only, it can be automated, scaled, and tailored to the needs of language-learners across the globe.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecond Language Acquisition and Learning · Natural Language Processing Techniques · Text Readability and Simplification
