Rethinking High Performance Computing Platforms: Challenges, Opportunities and Recommendations
Ole Weidner, Malcolm Atkinson, Adam Barker, Rosa Filgueira

TL;DR
This paper critiques current HPC platform models, proposes a more symmetric, application-centric model with a prototype implementation, aiming to enhance support for second-generation data-intensive, dynamic applications.
Contribution
It introduces an extended HPC platform model emphasizing decentralization, introspection, and bidirectional control, along with a prototype based on Linux Containers to improve application support.
Findings
Prototype cHPC operates alongside existing systems
Enhanced platform API enables symmetric control and information flow
Roadmap for future research and evaluation provided
Abstract
A new class of Second generation high-performance computing applications with heterogeneous, dynamic and data-intensive properties have an extended set of requirements, which cover application deployment, resource allocation, -control, and I/O scheduling. These requirements are not met by the current production HPC platform models and policies. This results in a loss of opportunity, productivity and innovation for new computational methods and tools. It also decreases effective system utilization for platform providers due to unsupervised workarounds and rogue resource management strategies implemented in application space. In this paper we critically discuss the dominant HPC platform model and describe the challenges it creates for second generation applications because of its asymmetric resource view, interfaces and software deployment policies. We present an extended, more symmetric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Scientific Computing and Data Management
