Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services
Ali Doosthosseini, Jonathan Decker, Hendrik Nolte, Julian M. Kunkel

TL;DR
This paper presents a secure, efficient, and seamless solution for deploying large language models on HPC clusters using Slurm, enabling private AI services that integrate with existing HPC workflows.
Contribution
It introduces a Slurm-native architecture for hosting LLMs on HPC systems, combining cloud web services with secure HPC backend deployment, and demonstrates its practical deployment.
Findings
Successful deployment as a production service
Secure HPC hosting with privacy guarantees
Seamless integration with Slurm workload management
Abstract
The widespread adoption of large language models (LLMs) has created a pressing need for an efficient, secure and private serving infrastructure, which allows researchers to run open source or custom fine-tuned LLMs and ensures users that their data remains private and is not stored without their consent. While high-performance computing (HPC) systems equipped with state-of-the-art GPUs are well-suited for training LLMs, their batch scheduling paradigm is not designed to support real-time serving of AI applications. Cloud systems, on the other hand, are well suited for web services but commonly lack access to the computational power of HPC clusters, especially expensive and scarce high-end GPUs, which are required for optimal inference speed. We propose an architecture with an implementation consisting of a web service that runs on a cloud VM with secure access to a scalable backend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · IoT and Edge/Fog Computing · Cloud Computing and Resource Management
Methodstravel james
