Reclaiming Idle CPU Cycles on Kubernetes: Sparse-Domain Multiplexing for Concurrent MPI-CFD Simulations
Tianfang Xie

TL;DR
This paper introduces a multiplexing framework for Kubernetes that reclaims idle CPU cycles during MPI-CFD simulations, significantly improving throughput by co-locating multiple simulations with minimal overhead.
Contribution
It presents a novel multiplexing approach with a dynamic controller and analytical model to efficiently run multiple MPI simulations concurrently on shared Kubernetes clusters.
Findings
Achieved 1.77x throughput with two simulations
Scaling to 3.74x throughput with five simulations
Automated pipeline with no manual intervention or pod restarts
Abstract
When MPI-parallel simulations run on shared Kubernetes clusters, conventional CPU scheduling leaves the vast majority of provisioned cycles idle at synchronization barriers. This paper presents a multiplexing framework that reclaims this idle capacity by co-locating multiple simulations on the same cluster. PMPI-based duty-cycle profiling quantifies the per-rank idle fraction; proportional CPU allocation then allows a second simulation to execute concurrently with minimal overhead, yielding 1.77x throughput. A Pareto sweep to N=5 concurrent simulations shows throughput scaling to 3.74x, with a knee at N=3 offering the best efficiency-cost trade-off. An analytical model with a single fitted parameter predicts these gains within +/-4%. A dynamic controller automates the full pipeline, from profiling through In-Place Pod Vertical Scaling (KEP-1287) to packing and fairness monitoring,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
