AI Knows Which Words Will Appear in Next Year's Korean CSAT

Byunghyun Ban; Jejong Lee; Hyeonmok Hwang

arXiv:2211.15426·cs.CL·August 4, 2023

AI Knows Which Words Will Appear in Next Year's Korean CSAT

Byunghyun Ban, Jejong Lee, Hyeonmok Hwang

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel AI-based method combining text-mining and LSTM deep learning to predict Korean CSAT vocabulary appearance in the next year with high accuracy, outperforming previous approaches.

Contribution

It introduces a new preprocessing technique and an LSTM-based prediction model that significantly improve vocabulary prediction accuracy for Korean CSAT exams.

Findings

01

Achieved 100% accuracy in high-score prediction areas.

02

Predicted word appearance with only 1.7% error where scores exceeded 60.

03

Developed a data screening tool with 4.35 to 6.21 times higher efficiency.

Abstract

A text-mining-based word class categorization method and LSTM-based vocabulary pattern prediction method are introduced in this paper. A preprocessing method based on simple text appearance frequency analysis is first described. This method was developed as a data screening tool but showed 4.35 ~ 6.21 times higher than previous works. An LSTM deep learning method is also suggested for vocabulary appearance pattern prediction method. AI performs a regression with various size of data window of previous exams to predict the probabilities of word appearance in the next exam. Predicted values of AI over various data windows are processed into a single score as a weighted sum, which we call an "AI-Score", which represents the probability of word appearance in next year's exam. Suggested method showed 100% accuracy at the range 100-score area and showed only 1.7% error of prediction in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

needleworm/bigdata_voca
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTechnology and Data Analysis · Diverse Approaches in Healthcare and Education Studies · Educational Systems and Policies

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory