Can They Dixit? Yes they Can! Dixit as a Playground for Multimodal Language Model Capabilities
Nishant Balepur, Dang Nguyen, Dayeon Ki

TL;DR
This paper introduces a game-based evaluation framework using the Dixit card game to assess multimodal large language models' capabilities, providing a holistic and objective measure that correlates with traditional benchmarks.
Contribution
It proposes a novel, game-based evaluation method for MLMs using Dixit, addressing limitations of existing benchmarks and subjective comparisons.
Findings
Dixit win-rate rankings align with traditional MLM benchmarks.
Games reveal differences in strategies between humans and models.
Framework offers a robust, engaging evaluation approach.
Abstract
Multi-modal large language models (MLMs) are often assessed on static, individual benchmarks -- which cannot jointly assess MLM capabilities in a single task -- or rely on human or model pairwise comparisons -- which is highly subjective, expensive, and allows models to exploit superficial shortcuts (e.g., verbosity) to inflate their win-rates. To overcome these issues, we propose game-based evaluations to holistically assess MLM capabilities. Games require multiple abilities for players to win, are inherently competitive, and are governed by fix, objective rules, and makes evaluation more engaging, providing a robust framework to address the aforementioned challenges. We manifest this evaluation specifically through Dixit, a fantasy card game where players must generate captions for a card that trick some, but not all players, into selecting the played card. Our quantitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems
