MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling

Zhong Li; Qi Huang; Yuxuan Zhu; Mohammad Mohammadi Amiri; Niki van Stein; Thomas B\"ack; Matthijs van Leeuwen; Zaiwen Wen; Lincen Yang

arXiv:2605.12154·cs.AI·May 13, 2026

MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling

Zhong Li, Qi Huang, Yuxuan Zhu, Mohammad Mohammadi Amiri, Niki van Stein, Thomas B\"ack, Matthijs van Leeuwen, Zaiwen Wen, Lincen Yang

PDF

TL;DR

This paper introduces MM-OptBench, a comprehensive multimodal optimization modeling benchmark with 780 verified instances, to evaluate large language models' ability to generate optimization formulations from text and visual data.

Contribution

It presents a novel multimodal benchmark and a solver-grounded framework for generating and verifying optimization models from combined text and visual problem specifications.

Findings

01

Best models achieve around 52% pass@1 on easy instances

02

General-purpose models have low success rates on hard instances

03

Math-specialized models do not solve any instances in the benchmark

Abstract

Optimization modeling translates real decision-making problems into mathematical optimization models and solver-executable implementations. Although language models are increasingly used to generate optimization formulations and solver code, existing benchmarks are almost entirely text-only. This omits many optimization-modeling tasks that arise in operational practice, where requirements are described in text but instance information is conveyed through visual artifacts such as tables, graphs, maps, schedules, and dashboards. We introduce multimodal optimization modeling, a benchmark setting in which models must construct both a mathematical formulation and executable solver code from a text-and-visual problem specification. To evaluate this setting, we develop a solver-grounded framework that generates structured optimization instances, verifies each with an exact solver, and builds…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.