WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks
L\'eo Boisvert, Megh Thakkar, Maxime Gasse, Massimo Caccia, Thibault, Le Sellier De Chezelles, Quentin Cappart, Nicolas Chapados, Alexandre, Lacoste, Alexandre Drouin

TL;DR
WorkArena++ introduces a comprehensive benchmark with 682 realistic enterprise tasks to evaluate and improve the planning, reasoning, and problem-solving abilities of large language models and vision-language models for workplace automation.
Contribution
The paper presents a new benchmark, WorkArena++, for assessing LLMs and VLMs on enterprise tasks, along with a method to generate ground-truth traces for model fine-tuning.
Findings
State-of-the-art models face challenges in enterprise task performance.
Benchmark reveals gaps in reasoning and planning capabilities.
Provides a scalable way to generate training data for model improvement.
Abstract
The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though recent LLMs seem capable of planning and reasoning given user instructions, their effectiveness in applying these capabilities for autonomous task solving remains underexplored. This is especially true in enterprise settings, where automated agents hold the promise of a high impact. To fill this gap, we propose WorkArena++, a novel benchmark consisting of 682 tasks corresponding to realistic workflows routinely performed by knowledge workers. WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents. Our empirical studies across state-of-the-art LLMs and vision-language models (VLMs), as well as human workers, reveal several challenges for such models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBusiness Process Modeling and Analysis · Semantic Web and Ontologies · Service-Oriented Architecture and Web Services
