The Design and Implementation of a Scalable DL Benchmarking Platform
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

TL;DR
This paper introduces MLModelScope, a scalable, framework- and hardware-agnostic DL benchmarking platform that enables consistent, reproducible evaluations of models across diverse systems, facilitating fair comparison and analysis.
Contribution
The paper presents MLModelScope, a novel open-source DL benchmarking platform designed with 10 key features to improve evaluation consistency, scalability, and hardware/software agnosticism.
Findings
Evaluated 37 models across 4 systems, revealing impact of hardware and framework choices.
Demonstrated MLModelScope's ability to identify performance bottlenecks.
Showcased comprehensive analysis of model accuracy and performance under various scenarios.
Abstract
The current Deep Learning (DL) landscape is fast-paced and is rife with non-uniform models, hardware/software (HW/SW) stacks, but lacks a DL benchmarking platform to facilitate evaluation and comparison of DL innovations, be it models, frameworks, libraries, or hardware. Due to the lack of a benchmarking platform, the current practice of evaluating the benefits of proposed DL innovations is both arduous and error-prone - stifling the adoption of the innovations. In this work, we first identify design features which are desirable within a DL benchmarking platform. These features include: performing the evaluation in a consistent, reproducible, and scalable manner, being framework and hardware agnostic, supporting real-world benchmarking workloads, providing in-depth model execution inspection across the HW/SW stack levels, etc. We then propose MLModelScope, a DL benchmarking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
