From Prompting to Preference Optimization: A Comparative Study of LLM-based Automated Essay Scoring

Minh Hoang Nguyen; Vu Hoang Pham; Xuan Thanh Huynh; Phuc Hong Mai; Vinh The Nguyen; Quang Nhut Huynh; Huy Tien Nguyen; Tung Le

arXiv:2603.06424·cs.CL·March 9, 2026

From Prompting to Preference Optimization: A Comparative Study of LLM-based Automated Essay Scoring

Minh Hoang Nguyen, Vu Hoang Pham, Xuan Thanh Huynh, Phuc Hong Mai, Vinh The Nguyen, Quang Nhut Huynh, Huy Tien Nguyen, Tung Le

PDF

Open Access

TL;DR

This paper provides a comprehensive empirical comparison of various LLM-based automated essay scoring methods on IELTS tasks, revealing trade-offs and identifying the most effective approach combining supervised fine-tuning and retrieval-augmented generation.

Contribution

It offers the first unified comparison of modern LLM-based AES strategies for English L2 writing, highlighting their relative merits and optimal configurations.

Findings

01

Best method achieves F1-Score 93%

02

Clear accuracy-cost-robustness trade-offs identified

03

Integration of supervised fine-tuning and RAG yields strongest results

Abstract

Large language models (LLMs) have recently reshaped Automated Essay Scoring (AES), yet prior studies typically examine individual techniques in isolation, limiting understanding of their relative merits for English as a Second Language (L2) writing. To bridge this gap, we presents a comprehensive comparison of major LLM-based AES paradigms on IELTS Writing Task~2. On this unified benchmark, we evaluate four approaches: (i) encoder-based classification fine-tuning, (ii) zero- and few-shot prompting, (iii) instruction tuning and Retrieval-Augmented Generation (RAG), and (iv) Supervised Fine-Tuning combined with Direct Preference Optimization (DPO) and RAG. Our results reveal clear accuracy-cost-robustness trade-offs across methods, the best configuration, integrating k-SFT and RAG, achieves the strongest overall results with F1-Score 93%. This study offers the first unified empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques