Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Output Prefilling

Silvia Cappelletti; Tobia Poppi; Samuele Poppi; Zheng-Xin Yong; Diego Garcia-Olano; Marcella Cornia; Lorenzo Baraldi; Rita Cucchiara

arXiv:2505.15323·cs.CL·April 6, 2026

Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Output Prefilling

Silvia Cappelletti, Tobia Poppi, Samuele Poppi, Zheng-Xin Yong, Diego Garcia-Olano, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

PDF

TL;DR

This paper introduces a prefilling technique to improve the accuracy and reliability of first-token probability methods in multiple-choice question answering with large language models, without modifying model parameters.

Contribution

It repurposes a structured prefix to steer models towards valid answers, significantly enhancing FTP-based evaluation performance across various benchmarks.

Findings

01

Prefilling improves accuracy, calibration, and consistency of FTP methods.

02

The approach outperforms standard FTP and rivals full decoding methods.

03

Prefilling is simple, robust, and low-cost for enhancing multiple-choice LLM evaluation.

Abstract

Large Language Models (LLMs) are increasingly evaluated on multiple-choice question answering (MCQA) tasks using *first-token probability* (FTP), which selects the answer option whose initial token has the highest likelihood. While efficient, FTP can be fragile: models may assign high probability to unrelated tokens (*misalignment*) or use a valid token merely as part of a generic preamble rather than as a clear answer choice (*misinterpretation*), undermining the reliability of symbolic evaluation. We propose a simple solution: the *prefilling attack*, a structured natural-language prefix (e.g., "*The correct option is:*") prepended to the model output. Originally explored in AI safety, we repurpose prefilling to steer the model to respond with a clean, valid option, without modifying its parameters. Empirically, the FTP with prefilling strategy substantially improves accuracy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.