On the robustness of ChatGPT in teaching Korean Mathematics
Phuong-Nam Nguyen, Quang Nguyen-The, An Vu-Minh, Diep-Anh Nguyen,, Xuan-Lam Pham

TL;DR
This study evaluates ChatGPT's robustness in answering Korean mathematics questions, revealing strengths in question classification but challenges in non-English contexts, and suggests directions for improving multilingual educational AI tools.
Contribution
It provides an empirical assessment of ChatGPT's performance on Korean math questions and analyzes its ability to rate questions and handle non-English content.
Findings
ChatGPT achieves 66.72% accuracy on Korean math questions.
It aligns well with educational criteria in question rating.
Struggles with non-English contexts, indicating linguistic biases.
Abstract
ChatGPT, an Artificial Intelligence model, has the potential to revolutionize education. However, its effectiveness in solving non-English questions remains uncertain. This study evaluates ChatGPT's robustness using 586 Korean mathematics questions. ChatGPT achieves 66.72% accuracy, correctly answering 391 out of 586 questions. We also assess its ability to rate mathematics questions based on eleven criteria and perform a topic analysis. Our findings show that ChatGPT's ratings align with educational theory and test-taker perspectives. While ChatGPT performs well in question classification, it struggles with non-English contexts, highlighting areas for improvement. Future research should address linguistic biases and enhance accuracy across diverse languages. Domain-specific optimizations and multilingual training could improve ChatGPT's role in personalized education.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
MethodsALIGN
