Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text
Christopher Clark, Jordi Salvador, Dustin Schwenk, Derrick Bonafilia,, Mark Yatskar, Eric Kolve, Alvaro Herrasti, Jonghyun Choi, Sachin Mehta, Sam, Skjonsberg, Carissa Schoenick, Aaron Sarnat, Hannaneh Hajishirzi, Aniruddha, Kembhavi, Oren Etzioni, Ali Farhadi

TL;DR
Iconary is a collaborative game combining drawing and guessing that challenges AI to understand and generate multimodal communication involving language, visual metaphors, and iconography, serving as a benchmark for multimodal AI research.
Contribution
The paper introduces Iconary, a novel Pictionary-based game for testing multimodal communication in AI, along with models trained on extensive human gameplay data.
Findings
Models achieve skillful gameplay but lag behind humans in drawing tasks.
The dataset and models serve as a new benchmark for multimodal communication.
AI models can incorporate world knowledge to interpret unseen words.
Abstract
Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community. In Iconary, a Guesser tries to identify a phrase that a Drawer is drawing by composing icons, and the Drawer iteratively revises the drawing to help the Guesser in response. This back-and-forth often uses canonical scenes, visual metaphor, or icon compositions to express challenging words, making it an ideal test for mixing language and visual/symbolic communication in AI. We propose models to play Iconary and train them on over 55,000 games between human players.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
