Bi-Level Prompt Optimization for Multimodal LLM-as-a-Judge

Bo Pan; Xuan Kan; Kaitai Zhang; Yan Yan; Shunwen Tan; Zihao He; Zixin Ding; Junjie Wu; Liang Zhao

arXiv:2602.11340·cs.AI·February 13, 2026

Bi-Level Prompt Optimization for Multimodal LLM-as-a-Judge

Bo Pan, Xuan Kan, Kaitai Zhang, Yan Yan, Shunwen Tan, Zihao He, Zixin Ding, Junjie Wu, Liang Zhao

PDF

Open Access

TL;DR

This paper introduces BLPO, a bi-level prompt optimization framework that enhances multimodal LLM-based evaluation of AI-generated images by converting visual data into text, overcoming context limitations and improving alignment with human judgments.

Contribution

We propose a novel bi-level prompt optimization method that jointly refines judge and image-to-text prompts for better multimodal evaluation under context constraints.

Findings

01

BLPO improves evaluation accuracy across multiple datasets.

02

The method effectively converts images into textual cues.

03

BLPO outperforms existing prompt optimization techniques.

Abstract

Large language models (LLMs) have become widely adopted as automated judges for evaluating AI-generated content. Despite their success, aligning LLM-based evaluations with human judgments remains challenging. While supervised fine-tuning on human-labeled data can improve alignment, it is costly and inflexible, requiring new training for each task or dataset. Recent progress in auto prompt optimization (APO) offers a more efficient alternative by automatically improving the instructions that guide LLM judges. However, existing APO methods primarily target text-only evaluations and remain underexplored in multimodal settings. In this work, we study auto prompt optimization for multimodal LLM-as-a-judge, particularly for evaluating AI-generated images. We identify a key bottleneck: multimodal models can only process a limited number of visual examples due to context window constraints,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)