Interactive Supercomputing on 40,000 Cores for Machine Learning and Data   Analysis

Albert Reuther; Jeremy Kepner; Chansup Byun; Siddharth Samsi; William; Arcand; David Bestor; Bill Bergeron; Vijay Gadepally; Michael Houle; Matthew; Hubbell; Michael Jones; Anna Klein; Lauren Milechin; Julia Mullen; Andrew; Prout; Antonio Rosa; Charles Yee; Peter Michaleas

arXiv:1807.07814·cs.DC·December 3, 2019

Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

Albert Reuther, Jeremy Kepner, Chansup Byun, Siddharth Samsi, William, Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew, Hubbell, Michael Jones, Anna Klein, Lauren Milechin, Julia Mullen, Andrew, Prout, Antonio Rosa, Charles Yee, Peter Michaleas

PDF

TL;DR

This paper presents the development of an interactive supercomputing system capable of launching thousands of machine learning and data analysis tasks within seconds on a 40,000-core supercomputer, enabling rapid experimentation.

Contribution

It introduces techniques for scaling interactive frameworks like TensorFlow and MATLAB/Octave to tens of thousands of cores with minimal latency.

Findings

01

32,000 TensorFlow processes launched in 4 seconds

02

262,000 Octave processes launched in 40 seconds

03

Enables rapid exploration of machine learning architectures

Abstract

Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as MATLAB/Octave, to tens of thousands of cores presents many technical challenges - in particular, rapidly dispatching many tasks through a scheduler, such as Slurm, and starting many instances of applications with thousands of dependencies. Careful tuning of launches and prepositioning of applications overcome these challenges and allow the launching of thousands of tasks in seconds on a 40,000-core supercomputer. Specifically, this work demonstrates launching 32,000 TensorFlow processes in 4 seconds and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.