CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare
Akash Ghosh, Tajamul Ashraf, Rishu Kumar Singh, Numan Saeed, Sriparna Saha, Xiuying Chen, Salman Khan

TL;DR
CarePilot is a multi-agent system designed for long-horizon, complex healthcare task automation, addressing the limitations of existing models in medical workflows through iterative learning and reasoning.
Contribution
We introduce CareFlow, a comprehensive healthcare workflow benchmark, and develop CarePilot, a novel multi-agent framework that improves long-horizon reasoning in medical automation tasks.
Findings
CarePilot outperforms existing models by over 15% on benchmark tasks.
The framework effectively handles multi-step, domain-specific medical workflows.
Iterative simulation enhances the robustness and reasoning capabilities of the system.
Abstract
Multimodal agentic pipelines are transforming human-computer interaction by enabling efficient and accessible automation of complex, real-world tasks. However, recent efforts have focused on short-horizon or general-purpose applications (e.g., mobile or desktop interfaces), leaving long-horizon automation for domain-specific systems, particularly in healthcare, largely unexplored. To address this, we introduce CareFlow, a high-quality human-annotated benchmark comprising complex, long-horizon software workflows across medical annotation tools, DICOM viewers, EHR systems, and laboratory information systems. On this benchmark, existing vision-language models (VLMs) perform poorly, struggling with long-horizon reasoning and multi-step interactions in medical contexts. To overcome this, we propose CarePilot, a multi-agent framework based on the actor-critic paradigm. The Actor integrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)
