NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models

Xiaohan Bi; Binhang Qi; Hailong Sun; Xiang Gao; Yue Yu; Xiaojun Liang

arXiv:2508.11348·cs.LG·August 18, 2025

NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models

Xiaohan Bi, Binhang Qi, Hailong Sun, Xiang Gao, Yue Yu, Xiaojun Liang

PDF

TL;DR

NeMo introduces a neuron-level modularization training method for DNNs that improves scalability and generalization, enabling effective decomposition of diverse large-scale models like Transformers.

Contribution

NeMo presents a scalable, neuron-level modular training approach that extends modularization to large-scale and diverse DNN architectures, including Transformers.

Findings

01

Achieves 1.72% higher module classification accuracy

02

Reduces module size by 58.10% on average

03

Demonstrates effectiveness on CNN and Transformer models

Abstract

With the growing incorporation of deep neural network (DNN) models into modern software systems, the prohibitive construction costs have become a significant challenge. Model reuse has been widely applied to reduce training costs, but indiscriminately reusing entire models may incur significant inference overhead. Consequently, DNN modularization has gained attention, enabling module reuse by decomposing DNN models. The emerging modularizing-while-training (MwT) paradigm, which incorporates modularization into training, outperforms modularizing-after-training approaches. However, existing MwT methods focus on small-scale CNN models at the convolutional kernel level and struggle with diverse DNNs and large-scale models, particularly Transformer-based models. To address these limitations, we propose NeMo, a scalable and generalizable MwT approach. NeMo operates at the neuron level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.