Ensemble ToT of LLMs and Its Application to Automatic Grading System for Supporting Self-Learning
Yuki Ito, Qiang Ma

TL;DR
This paper introduces Ensemble ToT, a framework that combines multiple LLMs to improve automatic grading accuracy and explainability, supporting self-learning by providing detailed feedback.
Contribution
It presents a novel ensemble framework for LLMs that enhances grading performance through multi-model integration and simulated debate, surpassing single-model approaches.
Findings
Improved grading accuracy with ensemble approach
Enhanced explainability of grading decisions
Effective coordination of multiple LLMs in evaluation
Abstract
Providing students with detailed and timely grading feedback is essential for self-learning. While existing LLM-based grading systems are promising, most of them rely on one single model, which limits their performance. To address this, we propose Ensemble Tree-of-Thought (ToT), a framework that enhances LLM outputs by integrating multiple models. Using this framework, we develop a grading system. Ensemble ToT follows three steps: (1) analyzing LLM performance, (2) generating candidate answers, and (3) refining them into a final result. Based on this, our grading system first evaluates the grading tendencies of LLMs, then generates multiple results, and finally integrates them via a simulated debate. Experimental results demonstrate our approach's ability to provide accurate and explainable grading by effectively coordinating multiple LLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment
