AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality
Uttamasha Monjoree, Wei Yan

TL;DR
This study evaluates GPT-4's spatial reasoning abilities in understanding 3D rotations using visual and textual data, highlighting its limitations and potential improvements with augmented reality and supplementary information.
Contribution
It demonstrates GPT-4's limited innate spatial understanding and shows how additional visual and textual cues can enhance its comprehension of 3D rotations.
Findings
GPT-4 struggles with understanding spatial rotations without extra info.
Adding coordinate axes and mathematical representations improves GPT-4's accuracy.
AR visualizations combined with textual explanations enhance AI's spatial reasoning.
Abstract
Spatial intelligence is important in Architecture, Construction, Science, Technology, Engineering, and Mathematics (STEM), and Medicine. Understanding three-dimensional (3D) spatial rotations can involve verbal descriptions and visual or interactive examples, illustrating how objects change orientation in 3D space. Recent studies show Artificial Intelligence (AI) with language and vision capabilities still face limitations in spatial reasoning. In this paper, we have studied generative AI's spatial capabilities of understanding rotations of objects utilizing its image and language processing features. We examined the spatial intelligence of the GPT-4 model with vision in understanding spatial rotation process with diagrams based on the Revised Purdue Spatial Visualization Test: Visualization of Rotations (Revised PSVT:R). Next, we incorporated a layer of coordinate system axes on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications · Augmented Reality Applications
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Residual Connection
