Automated Multiple Mini Interview (MMI) Scoring

Ryan Huynh; Frank Guerin; Alison Callwood

arXiv:2602.02360·cs.CL·February 3, 2026

Automated Multiple Mini Interview (MMI) Scoring

Ryan Huynh, Frank Guerin, Alison Callwood

PDF

Open Access

TL;DR

This paper presents a multi-agent prompting framework using large language models to reliably score Multiple Mini-Interviews, outperforming fine-tuned models and matching human reliability in assessing soft skills.

Contribution

Introduces a multi-agent prompt-based approach for MMI scoring that surpasses fine-tuning methods and generalizes well without additional training.

Findings

01

Outperforms fine-tuned models with Avg QWK 0.62 vs 0.32

02

Achieves human-level reliability in MMI scoring

03

Rivals domain-specific models on the ASAP benchmark

Abstract

Assessing soft skills such as empathy, ethical judgment, and communication is essential in competitive selection processes, yet human scoring is often inconsistent and biased. While Large Language Models (LLMs) have improved Automated Essay Scoring (AES), we show that state-of-the-art rationale-based fine-tuning methods struggle with the abstract, context-dependent nature of Multiple Mini-Interviews (MMIs), missing the implicit signals embedded in candidate narratives. We introduce a multi-agent prompting framework that breaks down the evaluation process into transcript refinement and criterion-specific scoring. Using 3-shot in-context learning with a large instruct-tuned model, our approach outperforms specialised fine-tuned baselines (Avg QWK 0.62 vs 0.32) and achieves reliability comparable to human experts. We further demonstrate the generalisability of our framework on the ASAP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education