An Eye for an AI: Evaluating GPT-4o's Visual Perception Skills and   Geometric Reasoning Skills Using Computer Graphics Questions

Tony Haoran Feng (1); Paul Denny (1); Burkhard C. W\"unsche (1),; Andrew Luxton-Reilly (1); Jacqueline Whalley (2) ((1) University of Auckland,; (2) Auckland University of Technology)

arXiv:2410.16991·cs.AI·October 23, 2024

An Eye for an AI: Evaluating GPT-4o's Visual Perception Skills and Geometric Reasoning Skills Using Computer Graphics Questions

Tony Haoran Feng (1), Paul Denny (1), Burkhard C. W\"unsche (1),, Andrew Luxton-Reilly (1), Jacqueline Whalley (2) ((1) University of Auckland,, (2) Auckland University of Technology)

PDF

TL;DR

This paper evaluates GPT-4o's abilities in visual perception and geometric reasoning through computer graphics questions, revealing its potential and limitations, and offers strategies for educators to integrate GenAI into teaching.

Contribution

It introduces two new datasets for assessing LMMs on CG questions and evaluates GPT-4o's performance, highlighting its strengths and areas for improvement in educational contexts.

Findings

01

GPT-4o can independently solve visual CG questions but with accuracy limitations.

02

Major challenges remain in the quality of generated results.

03

Proposes novel teaching strategies to incorporate GenAI despite current limitations.

Abstract

CG (Computer Graphics) is a popular field of CS (Computer Science), but many students find this topic difficult due to it requiring a large number of skills, such as mathematics, programming, geometric reasoning, and creativity. Over the past few years, researchers have investigated ways to harness the power of GenAI (Generative Artificial Intelligence) to improve teaching. In CS, much of the research has focused on introductory computing. A recent study evaluating the performance of an LLM (Large Language Model), GPT-4 (text-only), on CG questions, indicated poor performance and reliance on detailed descriptions of image content, which often required considerable insight from the user to return reasonable results. So far, no studies have investigated the abilities of LMMs (Large Multimodal Models), or multimodal LLMs, to solve CG questions and how these abilities can be used to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Multi-Head Attention · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Byte Pair Encoding