Ensemble ToT of LLMs and Its Application to Automatic Grading System for   Supporting Self-Learning

Yuki Ito; Qiang Ma

arXiv:2502.16399·cs.IR·February 25, 2025

Ensemble ToT of LLMs and Its Application to Automatic Grading System for Supporting Self-Learning

Yuki Ito, Qiang Ma

PDF

Open Access

TL;DR

This paper introduces Ensemble ToT, a framework that combines multiple LLMs to improve automatic grading accuracy and explainability, supporting self-learning by providing detailed feedback.

Contribution

It presents a novel ensemble framework for LLMs that enhances grading performance through multi-model integration and simulated debate, surpassing single-model approaches.

Findings

01

Improved grading accuracy with ensemble approach

02

Enhanced explainability of grading decisions

03

Effective coordination of multiple LLMs in evaluation

Abstract

Providing students with detailed and timely grading feedback is essential for self-learning. While existing LLM-based grading systems are promising, most of them rely on one single model, which limits their performance. To address this, we propose Ensemble Tree-of-Thought (ToT), a framework that enhances LLM outputs by integrating multiple models. Using this framework, we develop a grading system. Ensemble ToT follows three steps: (1) analyzing LLM performance, (2) generating candidate answers, and (3) refining them into a final result. Based on this, our grading system first evaluates the grading tendencies of LLMs, then generates multiple results, and finally integrates them via a simulated debate. Experimental results demonstrate our approach's ability to provide accurate and explainable grading by effectively coordinating multiple LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Technology and Assessment