Examining the Robustness of Large Language Models across Language Complexity
Jiayi Zhang

TL;DR
This study evaluates the robustness of large language models used in student assessments across varying levels of language complexity, focusing on their ability to detect self-regulated learning in math problem-solving texts.
Contribution
It provides an empirical analysis of how LLM-based student models perform with texts of different linguistic complexities, highlighting their robustness issues.
Findings
Models perform differently on high and low complexity texts
Language complexity impacts the accuracy of student models
Robustness varies across linguistic measures
Abstract
With the advancement of large language models (LLMs), an increasing number of student models have leveraged LLMs to analyze textual artifacts generated by students to understand and evaluate their learning. These student models typically employ pre-trained LLMs to vectorize text inputs into embeddings and then use the embeddings to train models to detect the presence or absence of a construct of interest. However, how reliable and robust are these models at processing language with different levels of complexity? In the context of learning where students may have different language backgrounds with various levels of writing skills, it is critical to examine the robustness of such models to ensure that these models work equally well for text with varying levels of language complexity. Coincidentally, a few (but limited) research studies show that the use of language can indeed impact the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques
