GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs

Xiaorong Zhu; Ziheng Jia; Jiarui Wang; Xiangyu Zhao; Haodong Duan; Xiongkuo Min; Jia Wang; Zicheng Zhang; Guangtao Zhai

arXiv:2506.00991·cs.CV·August 6, 2025

GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs

Xiaorong Zhu, Ziheng Jia, Jiarui Wang, Xiangyu Zhao, Haodong Duan, Xiongkuo Min, Jia Wang, Zicheng Zhang, Guangtao Zhai

PDF

1 Datasets

TL;DR

GOBench is a new benchmark that systematically evaluates Multi-modality Large Language Models' abilities in geometric optics generation and understanding, revealing significant challenges and gaps in current models.

Contribution

This paper introduces GOBench, the first comprehensive benchmark for assessing MLLMs' geometric optics capabilities through generation and understanding tasks.

Findings

01

Current models struggle with optical generation and understanding.

02

GPT-4o-Image shows limitations in generating authentic optical imagery.

03

Gemini-2.5Pro achieves only 37.35% accuracy in optical understanding.

Abstract

The rapid evolution of Multi-modality Large Language Models (MLLMs) is driving significant advancements in visual understanding and generation. Nevertheless, a comprehensive assessment of their capabilities, concerning the fine-grained physical principles especially in geometric optics, remains underexplored. To address this gap, we introduce GOBench, the first benchmark to systematically evaluate MLLMs' ability across two tasks: 1) Generating Optically Authentic Imagery and 2) Understanding Underlying Optical Phenomena. We curates high-quality prompts of geometric optical scenarios and use MLLMs to construct GOBench-Gen-1k dataset.We then organize subjective experiments to assess the generated imagery based on Optical Authenticity, Aesthetic Quality, and Instruction Fidelity, revealing MLLMs' generation flaws that violate optical principles. For the understanding task, we apply crafted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

bonnot/GOBench
dataset· 69 dl
69 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.