SuperSONIC: Cloud-Native Infrastructure for ML Inferencing

Dmitry Kondratyev; Benedikt Riedel; Yuan-Tang Chou; Miles Cochran-Branson; Noah Paladino; David Schultz; Mia Liu; Javier Duarte; Philip Harris; Shih-Chieh Hsu

arXiv:2506.20657·cs.DC·June 26, 2025

SuperSONIC: Cloud-Native Infrastructure for ML Inferencing

Dmitry Kondratyev, Benedikt Riedel, Yuan-Tang Chou, Miles Cochran-Branson, Noah Paladino, David Schultz, Mia Liu, Javier Duarte, Philip Harris, Shih-Chieh Hsu

PDF

TL;DR

SuperSONIC is a scalable, cloud-native infrastructure that enhances ML inference deployment efficiency across scientific experiments by leveraging coprocessors, Kubernetes, and NVIDIA Triton, enabling flexible, high-throughput processing.

Contribution

It introduces SuperSONIC, a scalable server infrastructure that standardizes and optimizes ML inference deployment on cloud-native platforms for scientific research.

Findings

01

Successfully deployed for CERN experiments and astrophysics observatories.

02

Enhanced resource utilization and throughput in ML inference workflows.

03

Demonstrated scalability and flexibility across diverse scientific domains.

Abstract

The increasing computational demand from growing data rates and complex machine learning (ML) algorithms in large-scale scientific experiments has driven the adoption of the Services for Optimized Network Inference on Coprocessors (SONIC) approach. SONIC accelerates ML inference by offloading it to local or remote coprocessors to optimize resource utilization. Leveraging its portability to different types of coprocessors, SONIC enhances data processing and model deployment efficiency for cutting-edge research in high energy physics (HEP) and multi-messenger astrophysics (MMA). We developed the SuperSONIC project, a scalable server infrastructure for SONIC, enabling the deployment of computationally intensive tasks to Kubernetes clusters equipped with graphics processing units (GPUs). Using NVIDIA Triton Inference Server, SuperSONIC decouples client workflows from server infrastructure,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.