SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios

Jieru Lin; Zhiwei Yu; B\"orje F. Karlsson

arXiv:2511.17649·cs.CV·March 3, 2026

SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios

Jieru Lin, Zhiwei Yu, B\"orje F. Karlsson

PDF

Open Access 1 Datasets

TL;DR

SWITCH is a comprehensive benchmark designed to evaluate embodied agents' abilities to interact with tangible control interfaces in long-horizon, real-world scenarios, addressing key challenges like causality, partial observability, and verification.

Contribution

The paper introduces SWITCH, a novel benchmark with diverse tasks and real device interactions, filling a gap in long-term embodied agent evaluation and providing resources for future research.

Findings

01

Commercial and open LMMMs show systematic failures on SWITCH tasks.

02

SWITCH reveals critical gaps in current embodied agent capabilities.

03

Benchmark resources are publicly available for community use.

Abstract

Autonomous agents operating in the real world must interact continuously with existing physical and semantic infrastructure, track delayed consequences, and verify outcomes over time. Everyday environments are rich in tangible control interfaces (TCIs)-e.g., light switches, appliance panels, and embedded GUI-posing core challenges for lifelong embodied agents, including partial observability, causal reasoning across time, and failure-aware verification under real-world constraints. Yet, current benchmarks rarely consider such long-horizon interaction and causality requirements. We introduce SWITCH (Semantic World Interface Tasks for Control & Handling), an embodied, task-driven benchmark created through iterative releases to probe these gaps. Its first iteration, SWITCH-Basic, evaluates five complementary abilities-task-aware VQA, semantic UI grounding, action generation, state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

BAAI-Agents/SWITCH-Basic-v1-open
dataset· 2.4k dl
2.4k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Context-Aware Activity Recognition Systems