LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health
Ye Tian, Zihao Wang, Onat Gungor, Xiaoran Fan, Tajana Rosing

TL;DR
LifeAgentBench is a comprehensive benchmark for evaluating large language models in personalized digital health support, highlighting current limitations and proposing a new multi-step agent to improve health reasoning in real-world scenarios.
Contribution
The paper introduces LifeAgentBench, a large-scale benchmark for health reasoning, and proposes LifeAgent, a multi-step agent that enhances LLM performance in digital health tasks.
Findings
11 LLMs evaluated reveal bottlenecks in reasoning capabilities.
LifeAgent outperforms baseline models in complex health reasoning tasks.
Benchmark and agent are publicly available for future research.
Abstract
Personalized digital health support requires long-horizon, cross-dimensional reasoning over heterogeneous lifestyle signals, and recent advances in mobile sensing and large language models (LLMs) make such support increasingly feasible. However, the capabilities of current LLMs in this setting remain unclear due to the lack of systematic benchmarks. In this paper, we introduce LifeAgentBench, a large-scale QA benchmark for long-horizon, cross-dimensional, and multi-user lifestyle health reasoning, containing 22,573 questions spanning from basic retrieval to complex reasoning. We release an extensible benchmark construction pipeline and a standardized evaluation protocol to enable reliable and scalable assessment of LLM-based health assistants. We then systematically evaluate 11 leading LLMs on LifeAgentBench and identify key bottlenecks in long-horizon aggregation and cross-dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Topic Modeling
