An Exploration of Higher Education Course Evaluation by Large Language Models
Bo Yuan, Jiazi Hu

TL;DR
This paper explores how large language models can automate and improve course evaluations in higher education by analyzing classroom discussions and providing structured, reliable feedback that aligns with expert judgments.
Contribution
It demonstrates the effectiveness of LLMs, especially fine-tuned Llama, in reliably conducting detailed course evaluations at scale, surpassing traditional methods.
Findings
LLMs can reliably evaluate courses at micro and macro levels
Fine-tuning improves evaluation accuracy and consistency
LLM feedback offers actionable insights for teaching improvement
Abstract
Course evaluation plays a critical role in ensuring instructional quality and guiding curriculum development in higher education. However, traditional evaluation methods, such as student surveys, classroom observations, and expert reviews, are often constrained by subjectivity, high labor costs, and limited scalability. With recent advancements in large language models (LLMs), new opportunities have emerged for generating consistent, fine-grained, and scalable course evaluations. This study investigates the use of three representative LLMs for automated course evaluation at both the micro level (classroom discussion analysis) and the macro level (holistic course review). Using classroom interaction transcripts and a dataset of 100 courses from a major institution in China, we demonstrate that LLMs can extract key pedagogical features and generate structured evaluation results aligned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiverse Approaches in Healthcare and Education Studies · Technology and Data Analysis · Computational and Text Analysis Methods
