CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing
Cheng Qian, Hyeonjeong Ha, Jiayu Liu, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Ji

TL;DR
CreativityBench is a new benchmark that evaluates large language models' ability to creatively repurpose objects by reasoning about their affordances, revealing current limitations in creative problem-solving.
Contribution
The paper introduces CreativityBench, a large-scale affordance knowledge base, and evaluates LLMs' creative reasoning, highlighting their struggles with affordance discovery and physical mechanism understanding.
Findings
Models often select plausible objects but struggle with identifying correct parts and affordances.
Scaling models does not significantly improve creative affordance reasoning.
Chain-of-Thought reasoning yields limited gains in creative tasks.
Abstract
Recent advances in large language models have led to strong performance on reasoning and environment-interaction tasks, yet their ability for creative problem-solving remains underexplored. We study this capability through the lens of creative tool use, where a model repurposes available objects by reasoning about their affordances and attributes rather than relying on canonical usage. As a first step, we introduce CreativityBench, a benchmark for evaluating affordance-based creativity in LLMs. To this end, we build a large-scale affordance knowledge base (KB) with 4K entities and 150K+ affordance annotations, explicitly linking objects, parts, attributes, and actionable uses. Building on this KB, we generate 14K grounded tasks that require identifying non-obvious yet physically plausible solutions under constraints. Evaluations across 10 state-of-the-art LLMs, including closed and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
