Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?
Jiahe Jin, Yanheng He, Mingyan Yang

TL;DR
This paper critiques current 3D LLM benchmarks for being susceptible to 2D-Cheating, proposes improved evaluation principles, and emphasizes separating 3D capabilities from lower-dimensional aspects to better assess true 3D understanding.
Contribution
It identifies the 2D-Cheating problem in 3D LLM evaluation, proposes principles for more accurate assessment, and advocates for explicit separation of 3D abilities from 2D or 1D aspects.
Findings
VLMs can solve 3D tasks using rendered images, exposing evaluation flaws.
Current benchmarks may not accurately measure 3D understanding.
Proposed principles improve the assessment of genuine 3D capabilities.
Abstract
In this work, we identify the "2D-Cheating" problem in 3D LLM evaluation, where these tasks might be easily solved by VLMs with rendered images of point clouds, exposing ineffective evaluation of 3D LLMs' unique 3D capabilities. We test VLM performance across multiple 3D LLM benchmarks and, using this as a reference, propose principles for better assessing genuine 3D understanding. We also advocate explicitly separating 3D abilities from 1D or 2D aspects when evaluating 3D LLMs. Code and data are available at https://github.com/LLM-class-group/Revisiting-3D-LLM-Benchmarks
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTunneling and Rock Mechanics · Advancements in Photolithography Techniques · Metallurgy and Material Forming
