Mamba4Net: Distilled Hybrid Mamba Large Language Models For Networking
Linhan Xia, Mingzhan Yang, Jingjing Wang, Ziwei Yan, Yakun Ren, Guo Yu, Kai Lei

TL;DR
Mamba4Net introduces a distillation framework that transfers knowledge from transformer-based LLMs to a more efficient Mamba architecture, enabling high-performance networking tasks with reduced computational and memory requirements.
Contribution
It presents a novel cross-architecture distillation method that significantly improves efficiency and resource utilization of LLMs in networking applications.
Findings
Achieves 3.96x higher throughput than transformer LLMs.
Uses only 5.48% of the storage footprint of previous LLM approaches.
Demonstrates superior performance across three networking tasks.
Abstract
Transformer-based large language models (LLMs) are increasingly being adopted in networking research to address domain-specific challenges. However, their quadratic time complexity and substantial model sizes often result in significant computational overhead and memory constraints, particularly in resource-constrained environments. Drawing inspiration from the efficiency and performance of the Deepseek-R1 model within the knowledge distillation paradigm, this paper introduces Mamba4Net, a novel cross-architecture distillation framework. Mamba4Net transfers networking-specific knowledge from transformer-based LLMs to student models built on the Mamba architecture, which features linear time complexity. This design substantially enhances computational efficiency compared to the quadratic complexity of transformer-based models, while the reduced model size further minimizes computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Software-Defined Networks and 5G · Advanced Neural Network Applications
