AI Knows Which Words Will Appear in Next Year's Korean CSAT
Byunghyun Ban, Jejong Lee, Hyeonmok Hwang

TL;DR
This paper presents a novel AI-based method combining text-mining and LSTM deep learning to predict Korean CSAT vocabulary appearance in the next year with high accuracy, outperforming previous approaches.
Contribution
It introduces a new preprocessing technique and an LSTM-based prediction model that significantly improve vocabulary prediction accuracy for Korean CSAT exams.
Findings
Achieved 100% accuracy in high-score prediction areas.
Predicted word appearance with only 1.7% error where scores exceeded 60.
Developed a data screening tool with 4.35 to 6.21 times higher efficiency.
Abstract
A text-mining-based word class categorization method and LSTM-based vocabulary pattern prediction method are introduced in this paper. A preprocessing method based on simple text appearance frequency analysis is first described. This method was developed as a data screening tool but showed 4.35 ~ 6.21 times higher than previous works. An LSTM deep learning method is also suggested for vocabulary appearance pattern prediction method. AI performs a regression with various size of data window of previous exams to predict the probabilities of word appearance in the next exam. Predicted values of AI over various data windows are processed into a single score as a weighted sum, which we call an "AI-Score", which represents the probability of word appearance in next year's exam. Suggested method showed 100% accuracy at the range 100-score area and showed only 1.7% error of prediction in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTechnology and Data Analysis · Diverse Approaches in Healthcare and Education Studies · Educational Systems and Policies
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
