SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios
Jieru Lin, Zhiwei Yu, B\"orje F. Karlsson

TL;DR
SWITCH is a comprehensive benchmark designed to evaluate embodied agents' abilities to interact with tangible control interfaces in long-horizon, real-world scenarios, addressing key challenges like causality, partial observability, and verification.
Contribution
The paper introduces SWITCH, a novel benchmark with diverse tasks and real device interactions, filling a gap in long-term embodied agent evaluation and providing resources for future research.
Findings
Commercial and open LMMMs show systematic failures on SWITCH tasks.
SWITCH reveals critical gaps in current embodied agent capabilities.
Benchmark resources are publicly available for community use.
Abstract
Autonomous agents operating in the real world must interact continuously with existing physical and semantic infrastructure, track delayed consequences, and verify outcomes over time. Everyday environments are rich in tangible control interfaces (TCIs)-e.g., light switches, appliance panels, and embedded GUI-posing core challenges for lifelong embodied agents, including partial observability, causal reasoning across time, and failure-aware verification under real-world constraints. Yet, current benchmarks rarely consider such long-horizon interaction and causality requirements. We introduce SWITCH (Semantic World Interface Tasks for Control & Handling), an embodied, task-driven benchmark created through iterative releases to probe these gaps. Its first iteration, SWITCH-Basic, evaluates five complementary abilities-task-aware VQA, semantic UI grounding, action generation, state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Context-Aware Activity Recognition Systems
