A System Architecture for Low Latency Multiprogramming Quantum Computing

Yilun Zhao; Yu Chen; Kaiyan Chang; He Li; Bing Li; Yinhe Han; Ying Wang

arXiv:2601.01158·cs.AR·January 6, 2026

A System Architecture for Low Latency Multiprogramming Quantum Computing

Yilun Zhao, Yu Chen, Kaiyan Chang, He Li, Bing Li, Yinhe Han, Ying Wang

PDF

Open Access

TL;DR

This paper introduces FLAMENCO, a system that enables low-latency, high-fidelity multiprogramming in quantum computing by using offline compilation and a streamlined runtime orchestrator, overcoming online compilation bottlenecks.

Contribution

FLAMENCO's architecture allows independent offline compilation and dynamic region selection, significantly reducing runtime latency and improving fidelity in multiprogramming quantum workloads.

Findings

01

Achieves over 5× runtime speedup compared to state-of-the-art baselines.

02

Removes online compilation overhead for quantum multiprogramming.

03

Maintains high device utilization and fidelity as concurrency increases.

Abstract

As quantum systems scale, Multiprogramming Quantum Computing (MPQC) becomes essential to improve device utilization and throughput. However, current MPQC pipelines rely on expensive online compilation to co-optimize concurrently running programs, because quantum executables are device-dependent, non-portable across qubit regions, and highly susceptible to noise and crosstalk. This online step dominates runtime and impedes low-latency deployments for practical, real-world workloads in the future, such as repeatedly invoked Quantum Neural Network (QNN) services. We present FLAMENCO, a fidelity-aware multi-version compilation system that enables independent offline compilation and low-latency, high-fidelity multiprogramming at runtime. At the architecture level, FLAMENCO abstracts devices into compute units to drastically shrink the search space of region allocation. At compile time, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Computing Algorithms and Architecture · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques