Can They Dixit? Yes they Can! Dixit as a Playground for Multimodal Language Model Capabilities

Nishant Balepur; Dang Nguyen; Dayeon Ki

arXiv:2510.19892·cs.CL·October 24, 2025

Can They Dixit? Yes they Can! Dixit as a Playground for Multimodal Language Model Capabilities

Nishant Balepur, Dang Nguyen, Dayeon Ki

PDF

Open Access

TL;DR

This paper introduces a game-based evaluation framework using the Dixit card game to assess multimodal large language models' capabilities, providing a holistic and objective measure that correlates with traditional benchmarks.

Contribution

It proposes a novel, game-based evaluation method for MLMs using Dixit, addressing limitations of existing benchmarks and subjective comparisons.

Findings

01

Dixit win-rate rankings align with traditional MLM benchmarks.

02

Games reveal differences in strategies between humans and models.

03

Framework offers a robust, engaging evaluation approach.

Abstract

Multi-modal large language models (MLMs) are often assessed on static, individual benchmarks -- which cannot jointly assess MLM capabilities in a single task -- or rely on human or model pairwise comparisons -- which is highly subjective, expensive, and allows models to exploit superficial shortcuts (e.g., verbosity) to inflate their win-rates. To overcome these issues, we propose game-based evaluations to holistically assess MLM capabilities. Games require multiple abilities for players to win, are inherently competitive, and are governed by fix, objective rules, and makes evaluation more engaging, providing a robust framework to address the aforementioned challenges. We manifest this evaluation specifically through Dixit, a fantasy card game where players must generate captions for a card that trick some, but not all players, into selecting the played card. Our quantitative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems