SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks
Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Parsa, Kavehzadeh, Marzieh Tahaei, Boxing Chen, and Ali Ghodsi

TL;DR
SortedNet is a scalable, generalized framework for training modular deep neural networks that enables efficient multi-model training, reduces storage, and improves dynamic model selection across various architectures and tasks.
Contribution
It introduces a novel nested architecture and update scheme that allows simultaneous training of multiple sub-models with minimal overhead, enhancing flexibility and scalability.
Findings
Trains 160 sub-models simultaneously with 96% of original performance.
Validates versatility across NLP and image classification tasks.
Outperforms existing dynamic training methods.
Abstract
Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous user/task-specific models. There are solutions in the literature to deal with single dynamic or many-in-one models instead of many individual networks; however, they suffer from significant drops in performance, lack of generalization across different model architectures or different dimensions (e.g. depth, width, attention blocks), heavy model search requirements during training, and training a limited number of sub-models. To address these limitations, we propose SortedNet, a generalized and scalable training solution to harness the inherent modularity of DNNs. Thanks to a generalized nested architecture (which we refer as \textit{sorted} architecture in this paper) with shared parameters and its novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Brain Tumor Detection and Classification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · WordPiece · Average Pooling · Layer Normalization · Dropout · Multi-Head Attention · Convolution · Attention Dropout
