CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models

Tung-Thuy Pham; Duy-Quan Luong; Minh-Quan Duong; Trung-Hieu Nguyen; Thu-Trang Nguyen; Son Nguyen; and Hieu Dinh Vo

arXiv:2508.02427·cs.AI·August 5, 2025

CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models

Tung-Thuy Pham, Duy-Quan Luong, Minh-Quan Duong, Trung-Hieu Nguyen, Thu-Trang Nguyen, Son Nguyen, and Hieu Dinh Vo

PDF

Open Access

TL;DR

CABENCH is a comprehensive benchmark for evaluating composable AI systems that assemble pre-trained models to solve complex tasks, highlighting current capabilities and future challenges.

Contribution

This paper introduces the first public benchmark for composable AI, including a diverse set of tasks, models, and an evaluation framework for systematic assessment.

Findings

01

Composable AI shows promise in solving complex real-world problems.

02

Current approaches are outperformed by human-designed solutions in some tasks.

03

Automating the generation of execution pipelines remains a key challenge.

Abstract

Composable AI offers a scalable and effective paradigm for tackling complex AI tasks by decomposing them into sub-tasks and solving each sub-task using ready-to-use well-trained models. However, systematically evaluating methods under this setting remains largely unexplored. In this paper, we introduce CABENCH, the first public benchmark comprising 70 realistic composable AI tasks, along with a curated pool of 700 models across multiple modalities and domains. We also propose an evaluation framework to enable end-to-end assessment of composable AI solutions. To establish initial baselines, we provide human-designed reference solutions and compare their performance with two LLM-based approaches. Our results illustrate the promise of composable AI in addressing complex real-world problems while highlighting the need for methods that can fully unlock its potential by automatically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Ethics and Social Impacts of AI