Benchmarking Compound AI Applications for Hardware-Software Co-Design

Paramuth Samuthrsindh; Angel Cervantes; Varun Gohil; Gohar Irfan Chaudhry; Christina Delimitrou; Adam Belay

arXiv:2604.09593·cs.DC·April 14, 2026

Benchmarking Compound AI Applications for Hardware-Software Co-Design

Paramuth Samuthrsindh, Angel Cervantes, Varun Gohil, Gohar Irfan Chaudhry, Christina Delimitrou, Adam Belay

PDF

TL;DR

This paper introduces a benchmarking suite for Compound AI applications, enabling analysis of their complex deployment configurations to improve hardware-software co-design and resource efficiency.

Contribution

It presents a standardized benchmarking suite for Compound AI applications, facilitating cross-stack analysis and guiding system design improvements.

Findings

01

Derived key design principles for hardware-software co-design.

02

Identified factors influencing application performance and resource consumption.

03

Provided insights into optimizing deployment configurations.

Abstract

Compound AI applications, composed from interactions between Large Language Models (LLMs), Machine Learning (ML) models, external tools and data sources are quickly becoming an integral workload in datacenters. Their diverse sub-components and use-cases present a large configuration-space across the deployment stack -- ranging from applications and serving software down to hardware -- each of which may influence the application performance, deployment cost, and/or resource consumption. Despite their rapid adoption, however, the systems community lacks a standardized benchmark for analyzing this complicated design-space and guiding in system design. In this work, we present our benchmarking suite used for cross-stack analysis of Compound AI applications. Using this, we derive key takeaways and design principles spanning several layers of the stack for hardware-software co-design to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.