HPC AI500: A Benchmark Suite for HPC AI Systems
Zihan Jiang, Wanling Gao, Lei Wang, Xingwang Xiong, Yuchen Zhang, Xu, Wen, Chunjie Luo, Hainan Ye, Yunquan Zhang, Shengzhong Feng, Kenli Li, Weijia, Xu, Jianfeng Zhan

TL;DR
HPC AI500 is a comprehensive benchmark suite designed to evaluate high performance computing systems running scientific deep learning workloads, covering diverse applications, metrics, and providing scalable implementations.
Contribution
It introduces a new benchmark suite for HPC AI systems based on real scientific DL applications, with comprehensive metrics and open-source implementation.
Findings
Includes 14 scientific DL benchmarks from various fields.
Provides metrics considering accuracy, performance, power, and cost.
Offers a scalable reference implementation as part of AIBench.
Abstract
In recent years, with the trend of applying deep learning (DL) in high performance scientific computing, the unique characteristics of emerging DL workloads in HPC raise great challenges in designing, implementing HPC AI systems. The community needs a new yard stick for evaluating the future HPC systems. In this paper, we propose HPC AI500 --- a benchmark suite for evaluating HPC systems that running scientific DL workloads. Covering the most representative scientific fields, each workload from HPC AI500 is based on real-world scientific DL applications. Currently, we choose 14 scientific DL benchmarks from perspectives of application scenarios, data sets, and software stack. We propose a set of metrics for comprehensively evaluating the HPC AI systems, considering both accuracy, performance as well as power and cost. We provide a scalable reference implementation of HPC AI500. HPC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Scientific Computing and Data Management
