Benchmarking Affordance Generalization with BusyBox

Dean Fortier; Timothy Adamson; Tess Hellebrekers; Teresa LaScala; Kofi Ennin; Michael Murray; Andrey Kolobov; Galen Mullins

arXiv:2602.05441·cs.RO·February 6, 2026

Benchmarking Affordance Generalization with BusyBox

Dean Fortier, Timothy Adamson, Tess Hellebrekers, Teresa LaScala, Kofi Ennin, Michael Murray, Andrey Kolobov, Galen Mullins

PDF

Open Access

TL;DR

This paper introduces BusyBox, a physical benchmark to evaluate the ability of vision-language-action models to generalize affordance manipulation across diverse object variations, highlighting significant challenges even for advanced models.

Contribution

The paper presents BusyBox, a new physical benchmark with diverse object variations for systematic evaluation of affordance generalization in VLAs, along with open resources for the research community.

Findings

01

Generalization across BusyBox variants is highly challenging for strong VLAs.

02

BusyBox is easy to build and replicate in most robotics labs.

03

The dataset includes language-annotated demonstrations with a mobile robot.

Abstract

Vision-Language-Action (VLA) models have been attracting the attention of researchers and practitioners thanks to their promise of generalization. Although single-task policies still offer competitive performance, VLAs are increasingly able to handle commands and environments unseen in their training set. While generalization in vision and language space is undoubtedly important for robust versatile behaviors, a key meta-skill VLAs need to possess is affordance generalization -- the ability to manipulate new objects with familiar physical features. In this work, we present BusyBox, a physical benchmark for systematic semi-automatic evaluation of VLAs' affordance generalization. BusyBox consists of 6 modules with switches, sliders, wires, buttons, a display, and a dial. The modules can be swapped and rotated to create a multitude of BusyBox variations with different visual appearances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics