SortedNet: A Scalable and Generalized Framework for Training Modular   Deep Neural Networks

Mojtaba Valipour; Mehdi Rezagholizadeh; Hossein Rajabzadeh; Parsa; Kavehzadeh; Marzieh Tahaei; Boxing Chen; and Ali Ghodsi

arXiv:2309.00255·cs.LG·June 4, 2024

SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks

Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Parsa, Kavehzadeh, Marzieh Tahaei, Boxing Chen, and Ali Ghodsi

PDF

Open Access

TL;DR

SortedNet is a scalable, generalized framework for training modular deep neural networks that enables efficient multi-model training, reduces storage, and improves dynamic model selection across various architectures and tasks.

Contribution

It introduces a novel nested architecture and update scheme that allows simultaneous training of multiple sub-models with minimal overhead, enhancing flexibility and scalability.

Findings

01

Trains 160 sub-models simultaneously with 96% of original performance.

02

Validates versatility across NLP and image classification tasks.

03

Outperforms existing dynamic training methods.

Abstract

Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous user/task-specific models. There are solutions in the literature to deal with single dynamic or many-in-one models instead of many individual networks; however, they suffer from significant drops in performance, lack of generalization across different model architectures or different dimensions (e.g. depth, width, attention blocks), heavy model search requirements during training, and training a limited number of sub-models. To address these limitations, we propose SortedNet, a generalized and scalable training solution to harness the inherent modularity of DNNs. Thanks to a generalized nested architecture (which we refer as \textit{sorted} architecture in this paper) with shared parameters and its novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Brain Tumor Detection and Classification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · WordPiece · Average Pooling · Layer Normalization · Dropout · Multi-Head Attention · Convolution · Attention Dropout