CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

Cheng Qian; Hyeonjeong Ha; Jiayu Liu; Jeonghwan Kim; Jiateng Liu; Bingxuan Li; Aditi Tiwari; Dwip Dalal; Zhenhailong Wang; Xiusi Chen; Mahdi Namazifar; Yunzhu Li; Heng Ji

arXiv:2605.02910·cs.AI·May 7, 2026

CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

Cheng Qian, Hyeonjeong Ha, Jiayu Liu, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Ji

PDF

1 Repo 1 Datasets

TL;DR

CreativityBench is a new benchmark that evaluates large language models' ability to creatively repurpose objects by reasoning about their affordances, revealing current limitations in creative problem-solving.

Contribution

The paper introduces CreativityBench, a large-scale affordance knowledge base, and evaluates LLMs' creative reasoning, highlighting their struggles with affordance discovery and physical mechanism understanding.

Findings

01

Models often select plausible objects but struggle with identifying correct parts and affordances.

02

Scaling models does not significantly improve creative affordance reasoning.

03

Chain-of-Thought reasoning yields limited gains in creative tasks.

Abstract

Recent advances in large language models have led to strong performance on reasoning and environment-interaction tasks, yet their ability for creative problem-solving remains underexplored. We study this capability through the lens of creative tool use, where a model repurposes available objects by reasoning about their affordances and attributes rather than relying on canonical usage. As a first step, we introduce CreativityBench, a benchmark for evaluating affordance-based creativity in LLMs. To this end, we build a large-scale affordance knowledge base (KB) with 4K entities and 150K+ affordance annotations, explicitly linking objects, parts, attributes, and actionable uses. Building on this KB, we generate 14K grounded tasks that require identifying non-obvious yet physically plausible solutions under constraints. Evaluations across 10 state-of-the-art LLMs, including closed and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

creativitybench/CreativityBench
github

Datasets

chengq9/CreativityBench
dataset· 90 dl
90 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.