PersonalHomeBench: Evaluating Agents in Personalized Smart Homes

Manasa Bharadwaj; Yolanda Liu; InJung Yang; Sungil Kim; Nikhil Verma; KoKeun Kim; Kevin Ferreira; YoungJoon Kim

arXiv:2604.16813·cs.AI·May 15, 2026

PersonalHomeBench: Evaluating Agents in Personalized Smart Homes

Manasa Bharadwaj, Yolanda Liu, InJung Yang, Sungil Kim, Nikhil Verma, KoKeun Kim, Kevin Ferreira, YoungJoon Kim

PDF

TL;DR

PersonalHomeBench is a new benchmark designed to evaluate foundation models acting as agentic assistants in personalized smart home environments, focusing on complex, context-dependent tasks and agent capabilities.

Contribution

It introduces a comprehensive benchmark and toolbox for assessing agentic AI in personalized smart homes, emphasizing realistic interaction and complex reasoning challenges.

Findings

01

Performance drops as task complexity increases.

02

Pronounced failures in counterfactual reasoning.

03

Challenges under partial observability requiring tool use.

Abstract

Agentic AI systems are rapidly advancing toward real-world applications, yet their readiness in complex and personalized environments remains insufficiently characterized. To address this gap, we introduce PersonalHomeBench, a benchmark for evaluating foundation models as agentic assistants in personalized smart home environments. The benchmark is constructed through an iterative process that progressively builds rich household states, which are then used to generate personalized, context-dependent tasks. To support realistic agent-environment interaction, we provide PersonalHomeTools, a comprehensive toolbox enabling household information retrieval, appliance control, and situational understanding. PersonalHomeBench evaluates both reactive and proactive agentic abilities under unimodal and multimodal observations. Thorough experimentation reveals a systematic performance reduction as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.