From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production
Segev Shlomov, Alon Oved, Sami Marreed, Ido Levy, Offer Akrabi, Avi Yaeli, {\L}ukasz Str\k{a}k, Elizabeth Koumpan, Yinon Goldshtein, Eilam Shapira, Nir Mashkif, Asaf Adi

TL;DR
This paper presents IBM's development and pilot deployment of a generalist agent, CUGA, in enterprise settings, demonstrating its potential for scalable, cost-effective automation with state-of-the-art benchmark performance and enterprise-specific evaluation.
Contribution
It introduces CUGA, a hierarchical generalist agent tailored for enterprise use, and provides early evidence of its effectiveness and lessons learned from real-world deployment.
Findings
CUGA achieves state-of-the-art performance on AppWorld and WebArena.
In enterprise pilot, CUGA approaches specialized agent accuracy.
Preliminary results suggest reduced development time and costs.
Abstract
Agents are rapidly advancing in automating digital work, but enterprises face a harder challenge: moving beyond prototypes to deployed systems that deliver measurable business value. This path is complicated by fragmented frameworks, slow development, and the absence of standardized evaluation practices. Generalist agents have emerged as a promising direction, excelling on academic benchmarks and offering flexibility across task types, applications, and modalities. Yet, evidence of their use in production enterprise settings remains limited. This paper reports IBM's experience developing and piloting the Computer Using Generalist Agent (CUGA), which has been open-sourced for the community (https://github.com/cuga-project/cuga-agent). CUGA adopts a hierarchical planner--executor architecture with strong analytical foundations, achieving state-of-the-art performance on AppWorld and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
