Deploying AI Frameworks on Secure HPC Systems with Containers

David Brayford; Sofia Vallecorsa; Atanas Atanasov; Fabio Baruffa,; Walter Riviera

arXiv:1905.10090·cs.DC·January 14, 2020

Deploying AI Frameworks on Secure HPC Systems with Containers

David Brayford, Sofia Vallecorsa, Atanas Atanasov, Fabio Baruffa,, Walter Riviera

PDF

TL;DR

This paper discusses the challenges and solutions for deploying AI frameworks on secure HPC systems, specifically using containers to enable compatibility and security in environments like SuperMUC-NG.

Contribution

It presents a practical approach for deploying AI frameworks on secure HPC systems using container technology, addressing compatibility and security challenges.

Findings

01

Successful deployment of AI frameworks on SuperMUC-NG

02

Containerization enables compatibility with HPC security restrictions

03

Addresses challenges faced by data scientists in HPC environments

Abstract

The increasing interest in the usage of Artificial Intelligence techniques (AI) from the research community and industry to tackle "real world" problems, requires High Performance Computing (HPC) resources to efficiently compute and scale complex algorithms across thousands of nodes. Unfortunately, typical data scientists are not familiar with the unique requirements and characteristics of HPC environments. They usually develop their applications with high-level scripting languages or frameworks such as TensorFlow and the installation process often requires connection to external systems to download open source software during the build. HPC environments, on the other hand, are often based on closed source applications that incorporate parallel and distributed computing API's such as MPI and OpenMP, while users have restricted administrator privileges, and face security restrictions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.