$\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles

Trishanu Das; Abhilash Nandy; Khush Bajaj; Deepiha S

arXiv:2511.01340·cs.CV·November 4, 2025

$\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles

Trishanu Das, Abhilash Nandy, Khush Bajaj, Deepiha S

PDF

Open Access 2 Datasets

TL;DR

This paper introduces a large, diverse benchmark for evaluating vision-language models on Rebus Puzzles, and proposes a reasoning framework that improves model performance significantly.

Contribution

The paper presents $ig| ightarrow oxed{ ext{BUS}} ig|$, a comprehensive Rebus Puzzle benchmark, and introduces RebusDescProgICE, a reasoning framework that enhances model accuracy on this task.

Findings

01

Benchmark contains 1,333 puzzles across 18 categories.

02

RebusDescProgICE improves model performance by 2.1-4.1% (closed-source) and 20-30% (open-source).

03

Models show improved understanding of complex, multi-step reasoning tasks.

Abstract

Understanding Rebus Puzzles (Rebus Puzzles use pictures, symbols, and letters to represent words or phrases creatively) requires a variety of skills such as image recognition, cognitive skills, commonsense reasoning, multi-step reasoning, image-based wordplay, etc., making this a challenging task for even current Vision-Language Models. In this paper, we present $↻ BUS$ , a large and diverse benchmark of $1, 333$ English Rebus Puzzles containing different artistic styles and levels of difficulty, spread across 18 categories such as food, idioms, sports, finance, entertainment, etc. We also propose $R e b u sD esc P r o g I C E$ , a model-agnostic framework which uses a combination of an unstructured description and code-based, structured reasoning, along with better, reasoning-based in-context example selection, improving the performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling