You Only Forward Once: An Efficient Compositional Judging Paradigm
Tianlong Zhang, Hongwei Xue, Shilin Yan, Di Wu, Chen Xu, Guannan Zhang, Yunyun Yang

TL;DR
YOFO introduces a fast, single-pass judging paradigm for multimodal large language models that efficiently verifies multiple structured requirements simultaneously, maintaining interpretability and achieving state-of-the-art results.
Contribution
It proposes a novel template-conditioned method enabling one-pass binary requirement verification, significantly improving speed while preserving interpretability in multimodal judgment tasks.
Findings
Achieves state-of-the-art performance on recommendation datasets.
Provides orders-of-magnitude speedup over autoregressive methods.
Supports dependency-aware analysis and benefits from post-hoc chain-of-thought.
Abstract
Multimodal large language models (MLLMs) show strong potential as judges. However, existing approaches face a fundamental trade-off: adapting MLLMs to output a single score misaligns with the generative nature of MLLMs and limits fine-grained requirement understanding, whereas autoregressively generating judging analyses is prohibitively slow in high-throughput settings. Observing that judgment reduces to verifying whether inputs satisfy a set of structured requirements, we propose YOFO, a template-conditioned method that judges all requirements in a single forward pass. Built on an autoregressive model, YOFO accepts a structured requirement template and, in one inference step, produces a binary yes/no decision for each requirement by reading the logits of the final token associated with that requirement. This design yields orders-of-magnitude speedups while preserving interpretability.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
