Towards Training Reproducible Deep Learning Models

Boyuan Chen; Mingzhi Wen; Yong Shi; Dayi Lin; Gopi Krishnan; Rajbahadur; Zhen Ming (Jack) Jiang

arXiv:2202.02326·cs.LG·February 8, 2022

Towards Training Reproducible Deep Learning Models

Boyuan Chen, Mingzhi Wen, Yong Shi, Dayi Lin, Gopi Krishnan, Rajbahadur, Zhen Ming (Jack) Jiang

PDF

1 Repo

TL;DR

This paper presents a systematic approach to enhance the reproducibility of deep learning models by addressing software randomness and hardware non-determinism, validated through case studies on multiple models.

Contribution

It introduces a comprehensive framework combining record-and-replay and profile-and-patch techniques, along with evaluation criteria and guidelines for reproducible deep learning training.

Findings

01

Successfully reproduces six open-source DL models

02

Effectively mitigates software and hardware sources of randomness

03

Provides a practical guideline for reproducible DL training

Abstract

Reproducibility is an increasing concern in Artificial Intelligence (AI), particularly in the area of Deep Learning (DL). Being able to reproduce DL models is crucial for AI-based systems, as it is closely tied to various tasks like training, testing, debugging, and auditing. However, DL models are challenging to be reproduced due to issues like randomness in the software (e.g., DL algorithms) and non-determinism in the hardware (e.g., GPU). There are various practices to mitigate some of the aforementioned issues. However, many of them are either too intrusive or can only work for a specific usage context. In this paper, we propose a systematic approach to training reproducible DL models. Our approach includes three main parts: (1) a set of general criteria to thoroughly evaluate the reproducibility of DL models for two different domains, (2) a unified framework which leverages a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nemo9cby/icse2022rep
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.