Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services

Ali Doosthosseini; Jonathan Decker; Hendrik Nolte; Julian M. Kunkel

arXiv:2407.00110·cs.DC·August 5, 2024·6 cites

Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services

Ali Doosthosseini, Jonathan Decker, Hendrik Nolte, Julian M. Kunkel

PDF

Open Access 3 Repos

TL;DR

This paper presents a secure, efficient, and seamless solution for deploying large language models on HPC clusters using Slurm, enabling private AI services that integrate with existing HPC workflows.

Contribution

It introduces a Slurm-native architecture for hosting LLMs on HPC systems, combining cloud web services with secure HPC backend deployment, and demonstrates its practical deployment.

Findings

01

Successful deployment as a production service

02

Secure HPC hosting with privacy guarantees

03

Seamless integration with Slurm workload management

Abstract

The widespread adoption of large language models (LLMs) has created a pressing need for an efficient, secure and private serving infrastructure, which allows researchers to run open source or custom fine-tuned LLMs and ensures users that their data remains private and is not stored without their consent. While high-performance computing (HPC) systems equipped with state-of-the-art GPUs are well-suited for training LLMs, their batch scheduling paradigm is not designed to support real-time serving of AI applications. Cloud systems, on the other hand, are well suited for web services but commonly lack access to the computational power of HPC clusters, especially expensive and scarce high-end GPUs, which are required for optimal inference speed. We propose an architecture with an implementation consisting of a web service that runs on a cloud VM with secure access to a scalable backend…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · IoT and Edge/Fog Computing · Cloud Computing and Resource Management

Methodstravel james