Models Know Models Best: Evaluation via Model-Preferred Formats

Joonhak Lee; Sungmok Jung; Jongyeon Park; and Jaejin Lee

arXiv:2601.22699·cs.CL·February 2, 2026

Models Know Models Best: Evaluation via Model-Preferred Formats

Joonhak Lee, Sungmok Jung, Jongyeon Park, and Jaejin Lee

PDF

Open Access

TL;DR

This paper investigates how different evaluation formats affect Large Language Model performance and introduces a dynamic, model-preference-based approach to optimize format selection, significantly improving zero-shot accuracy.

Contribution

It presents a novel, lightweight classifier that dynamically chooses the best evaluation format based on model signals, outperforming traditional heuristics.

Findings

01

Format choice significantly impacts LLM performance.

02

Model-preference signals can guide optimal format selection.

03

Dynamic format alignment improves zero-shot accuracy.

Abstract

Performance of Large Language Models (LLMs) on multiple-choice tasks differs markedly between symbol-based and cloze-style evaluation formats. The observed discrepancies are systematically attributable to task characteristics: natural language continuation benefits from likelihood scoring, whereas explicit comparison is better suited to symbol-based selection. These trends are consistent across various decoder-based LLMs, indicating model-agnostic effects. To address these inconsistencies, a dynamic format-alignment strategy is introduced that employs a lightweight classifier trained on latent model-preference signals. In contrast to human-designed heuristics, which often degrade performance, this approach uses model-generated signals to determine the optimal format for each problem instance. The proposed method achieves substantial and consistent improvements in zero-shot accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods