Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models

Jae-Won Chung; Jeff J. Ma; Jisang Ahn; Yizhuo Liang; Akshay Jajoo; Myungjin Lee; Mosharaf Chowdhury

arXiv:2603.12118·cs.LG·April 29, 2026

Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models

Jae-Won Chung, Jeff J. Ma, Jisang Ahn, Yizhuo Liang, Akshay Jajoo, Myungjin Lee, Mosharaf Chowdhury

PDF

TL;DR

Cornserve is a flexible, distributed system designed to efficiently serve Any-to-Any multimodal models, supporting diverse data types with improved throughput and latency.

Contribution

It introduces a novel task abstraction and execution model for scalable, flexible serving of complex multimodal models on Kubernetes.

Findings

01

Supports diverse Any-to-Any models with high throughput

02

Achieves up to 3.81× higher throughput

03

Reduces tail latency by up to 5.79×

Abstract

Any-to-Any models are an emerging class of multimodal models that accept combinations of multimodal data (e.g., text, image, video, audio) as input and generate them as output. Serving these models are challenging; different requests with different input and output modalities traverse different paths through the model computation graph, and each component of the model have different scaling characteristics. We present Cornserve, a distributed serving system for generic Any-to-Any models. Cornserve provides a flexible task abstraction for expressing Any-to-Any model computation graphs, enabling component disaggregation and independent scaling. The distributed runtime dispatches compute to the data plane via an efficient record-and-replay execution model that keeps track of data dependencies, and forwards tensor data between components directly from the producer to the consumer. Built…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.