You Only Forward Once: An Efficient Compositional Judging Paradigm

Tianlong Zhang; Hongwei Xue; Shilin Yan; Di Wu; Chen Xu; Guannan Zhang; Yunyun Yang

arXiv:2511.16600·cs.AI·February 3, 2026

You Only Forward Once: An Efficient Compositional Judging Paradigm

Tianlong Zhang, Hongwei Xue, Shilin Yan, Di Wu, Chen Xu, Guannan Zhang, Yunyun Yang

PDF

Open Access 1 Models

TL;DR

YOFO introduces a fast, single-pass judging paradigm for multimodal large language models that efficiently verifies multiple structured requirements simultaneously, maintaining interpretability and achieving state-of-the-art results.

Contribution

It proposes a novel template-conditioned method enabling one-pass binary requirement verification, significantly improving speed while preserving interpretability in multimodal judgment tasks.

Findings

01

Achieves state-of-the-art performance on recommendation datasets.

02

Provides orders-of-magnitude speedup over autoregressive methods.

03

Supports dependency-aware analysis and benefits from post-hoc chain-of-thought.

Abstract

Multimodal large language models (MLLMs) show strong potential as judges. However, existing approaches face a fundamental trade-off: adapting MLLMs to output a single score misaligns with the generative nature of MLLMs and limits fine-grained requirement understanding, whereas autoregressively generating judging analyses is prohibitively slow in high-throughput settings. Observing that judgment reduces to verifying whether inputs satisfy a set of structured requirements, we propose YOFO, a template-conditioned method that judges all requirements in a single forward pass. Built on an autoregressive model, YOFO accepts a structured requirement template and, in one inference step, produces a binary yes/no decision for each requirement by reading the logits of the final token associated with that requirement. This design yields orders-of-magnitude speedups while preserving interpretability.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Accio-Lab/yofo-Qwen3-VL-2B-Instruct
model· 11 dl
11 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications