MAGIC: Multi-Agent Argumentation and Grammar Integrated Critiquer
Joaqu\'in Jord\'an, Xavier Yin, Melissa Fabros, Gireeja Ranade, Narges Norouzi

TL;DR
MAGIC is a multi-agent framework that evaluates essays on multiple criteria, providing accurate scores and detailed feedback, especially effective at the college level, and surpassing baseline models in performance and interpretability.
Contribution
Introduces MAGIC, a novel multi-agent system for holistic essay scoring and feedback generation, tailored for college-level assessments with a new GRE dataset.
Findings
Achieves near-perfect agreement with human scores on GRE essays.
Outperforms baseline LLM models in scoring accuracy and feedback quality.
Provides interpretable evaluations through multi-agent collaboration.
Abstract
Automated Essay Scoring (AES) and Automatic Essay Feedback (AEF) systems aim to reduce the workload of human raters in educational assessment. However, most existing systems prioritize numerical scoring accuracy over feedback quality and are primarily evaluated on pre-secondary school level writing. This paper presents Multi-Agent Argumentation and Grammar Integrated Critiquer (MAGIC), a framework using five specialized agents to evaluate prompt adherence, persuasiveness, organization, vocabulary, and grammar for both holistic scoring and detailed feedback generation. To support evaluation at the college level, we collated a dataset of Graduate Record Examination (GRE) practice essays with expert-evaluated scores and feedback. MAGIC achieves substantial to near-perfect scoring agreement with humans on the GRE data, outperforming baseline LLM models while providing enhanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Natural Language Processing Techniques
