Domain-Grounded Evaluation of LLMs in International Student Knowledge
Claudinei Daitx, Haitham Amar

TL;DR
This paper evaluates the reliability of large language models in advising international students on high-stakes questions, focusing on accuracy, hallucinations, and domain coverage using real-world questions from an EdTech platform.
Contribution
It introduces a domain-grounded evaluation protocol for assessing LLMs in educational advising, emphasizing accuracy, hallucination detection, and domain coverage analysis.
Findings
Models vary in accuracy and hallucination rates.
Common failure modes include incomplete and off-topic answers.
The protocol helps identify dependable models for educational use.
Abstract
Large language models (LLMs) are increasingly used to answer high-stakes study-abroad questions about admissions, visas, scholarships, and eligibility. Yet it remains unclear how reliably they advise students, and how often otherwise helpful answers drift into unsupported claims (``hallucinations''). This work provides a clear, domain-grounded overview of how current LLMs behave in this setting. Using realistic questions set drawn from ApplyBoard's advising workflows -- an EdTech platform that supports students from discovery to enrolment -- we evaluate two essentials side by side: accuracy (is the information correct and complete?) and hallucination (does the model add content not supported by the question or domain evidence). These questions are categorized by domain scope which can be a single-domain or multi-domain -- when it must integrate evidence across areas such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Educational Assessment and Pedagogy
