DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

Nardine Basta; Dali Kaafar

arXiv:2603.03321·cs.CL·March 5, 2026

DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

Nardine Basta, Dali Kaafar

PDF

Open Access

TL;DR

DIALEVAL introduces a type-theoretic, automated framework using dual LLM agents to evaluate instruction following in large language models, aligning better with human judgment and handling multi-turn dialogues.

Contribution

It presents a novel automated, formal method for instruction evaluation that improves accuracy and human correlation, especially in complex and conversational settings.

Findings

01

Achieves 90.38% accuracy in evaluation tasks.

02

Reduces error rate by 26.45% compared to baselines.

03

Enhances evaluation in multi-turn dialogues.

Abstract

Evaluating instruction following in Large Language Models requires decomposing instructions into verifiable requirements and assessing satisfaction--tasks currently dependent on manual annotation and uniform criteria that do not align with human judgment patterns. We present DIALEVAL, a type-theoretic framework using dual LLM agents to automate instruction decomposition into typed predicates and implement type-specific satisfaction semantics. The framework enforces formal atomicity and independence constraints during automated extraction, then applies differentiated evaluation criteria--semantic equivalence for content predicates, exact precision for numerical predicates--mirroring empirically observed human assessment patterns. Extended to multi-turn dialogues through history-aware satisfaction functions, DIALEVAL enables evaluation in conversational contexts where single-turn methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications