MONET: Modeling and Optimization of neural NEtwork Training from Edge to Data Centers
J\'er\'emy Morlier, Robin Geens, Stef Cuyckens, Arne Symons, Marian Verhelst, Vincent Gripon, Mathieu L\'eonardon

TL;DR
MONET is a framework that models neural network training on heterogeneous accelerators, enabling exploration of hardware configurations and trade-offs specific to training workloads, which differ from inference-focused models.
Contribution
This paper introduces MONET, a novel framework for modeling and optimizing neural network training workflows on heterogeneous dataflow accelerators, addressing a gap in existing inference-focused tools.
Findings
MONET accurately models training workflows for ResNet-18 and GPT-2.
The framework identifies optimal layer-fusion configurations for training.
Trade-offs in activation checkpointing are effectively explored using genetic algorithms.
Abstract
While hardware-software co-design has significantly improved the efficiency of neural network inference, modeling the training phase remains a critical yet underexplored challenge. Training workloads impose distinct constraints, particularly regarding memory footprint and backpropagation complexity, which existing inference-focused tools fail to capture. This paper introduces MONET, a framework designed to model the training of neural networks on heterogeneous dataflow accelerators. MONET builds upon Stream, an experimentally verified framework that that models the inference of neural networks on heterogeneous dataflow accelerators with layer fusion. Using MONET, we explore the design space of ResNet-18 and a small GPT-2, demonstrating the framework's capability to model training workflows and find better hardware architectures. We then further examine problems that become more complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)
