Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
Yixin Zhu, Zixiong Wang, Jian Yang, Jin Xie, Jingyi Yu, Jiayuan Gu, Beibei Wang

TL;DR
This paper introduces VISER, a high-fidelity, visually realistic benchmark for robot manipulation simulation, addressing the visual domain gap and enabling reliable prediction of real-world performance.
Contribution
The paper presents a new benchmark with a large dataset of physically-based rendered assets and an automated pipeline for scalable, realistic simulation evaluation.
Findings
VISER achieves a Pearson correlation coefficient of 0.92 between simulation and real-world performance.
The benchmark includes diverse tasks like grasping, placing, and long-horizon activities.
Physically-based rendering and material-aware segmentation improve simulation realism.
Abstract
Reliable simulation evaluation of robot manipulation policies serves as a high-fidelity proxy for real-world performance. Although existing benchmarks cover a wide range of task categories, they lack visual realism, creating a large domain gap between simulation and reality. This undermines the reliability of simulation-based evaluation in predicting real-world performance. To mitigate the sim-to-real visual gap, we conduct a systematic analysis to isolate the effects of lighting and material. Our results show that these factors play a critical role in geometric reasoning and spatial grounding, yet are largely overlooked in existing benchmarks. Motivated by the analysis, we propose VISER, a visually realistic benchmark for evaluating robot manipulation in simulation. VISER features a high-fidelity dataset of over 1,000 3D assets with physically-based rendering (PBR) materials, along…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
