PersonalHomeBench: Evaluating Agents in Personalized Smart Homes
Manasa Bharadwaj, Yolanda Liu, InJung Yang, Sungil Kim, Nikhil Verma, KoKeun Kim, Kevin Ferreira, YoungJoon Kim

TL;DR
PersonalHomeBench is a new benchmark designed to evaluate foundation models acting as agentic assistants in personalized smart home environments, focusing on complex, context-dependent tasks and agent capabilities.
Contribution
It introduces a comprehensive benchmark and toolbox for assessing agentic AI in personalized smart homes, emphasizing realistic interaction and complex reasoning challenges.
Findings
Performance drops as task complexity increases.
Pronounced failures in counterfactual reasoning.
Challenges under partial observability requiring tool use.
Abstract
Agentic AI systems are rapidly advancing toward real-world applications, yet their readiness in complex and personalized environments remains insufficiently characterized. To address this gap, we introduce PersonalHomeBench, a benchmark for evaluating foundation models as agentic assistants in personalized smart home environments. The benchmark is constructed through an iterative process that progressively builds rich household states, which are then used to generate personalized, context-dependent tasks. To support realistic agent-environment interaction, we provide PersonalHomeTools, a comprehensive toolbox enabling household information retrieval, appliance control, and situational understanding. PersonalHomeBench evaluates both reactive and proactive agentic abilities under unimodal and multimodal observations. Thorough experimentation reveals a systematic performance reduction as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
