Beyond Pre-Training: The Full Lifecycle of Foundation Models on HPC Systems

Dino Conciatore; Elia Oggian; Federico Da Forno; Stefano Schuppli; Jerome Tissieres; Joost VandeVondele; Maxime Martinasso

arXiv:2604.12599·cs.DC·April 15, 2026

Beyond Pre-Training: The Full Lifecycle of Foundation Models on HPC Systems

Dino Conciatore, Elia Oggian, Federico Da Forno, Stefano Schuppli, Jerome Tissieres, Joost VandeVondele, Maxime Martinasso

PDF

TL;DR

This paper explores the full AI lifecycle on HPC systems, proposing a hybrid cloud-native platform at Swiss National Supercomputing Centre to enable efficient fine-tuning and inference workflows.

Contribution

It introduces a novel Kubernetes-based architecture combining HPC and cloud resources for complete AI lifecycle management on supercomputers.

Findings

01

Hybrid platform improves user productivity in AI workflows

02

Analysis of trade-offs in fine-tuning pipelines and inference services

03

Blueprint for integrating AI services into supercomputing environments

Abstract

Large-scale pre-training of Foundational Models (FM) constitutes a computationally intensive first phase for enabling AI across diverse scientific and societal applications. This first phase has positioned High-Performance Computing (HPC) facilities as indispensable backbones of "Sovereign AI" initiatives. While the massive throughput requirements of FM pre-training align with the traditional capability-oriented mission of HPC, subsequent phases of the AI lifecycle, typically referred to as fine-tuning and inference, introduce operational paradigms that can conflict with established batch-processing environments. Moreover, these phases are not computationally trivial: they often require substantial high-end compute resources while exhibiting hardware utilization patterns that differ significantly from those of pre-training. This paper addresses the architectural and strategic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.