Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts
Leiyu Pan, Zhenpeng Su, Minxuan Lv, Yizhe Xiong, Xiangwen Zhang, Zijia, Lin, Hui Chen, Jungong Han, Guiguang Ding, Cheng Luo, Di Zhang, Kun Gai, Deyi, Xiong

TL;DR
Finedeep introduces a multi-layered expert architecture with a novel routing mechanism to reduce sparse activation in dense large language models, leading to improved performance without increasing parameters.
Contribution
The paper proposes a new deep-layered fine-grained expert architecture and routing mechanism to mitigate sparse activation in dense models, enhancing their efficiency and effectiveness.
Findings
Finedeep outperforms traditional dense models in perplexity and benchmark tasks.
Optimal results are achieved by balancing depth and width of the expert architecture.
The approach effectively alleviates sparse activation and improves utilization of model capacity.
Abstract
Large language models have demonstrated exceptional performance across a wide range of tasks. However, dense models usually suffer from sparse activation, where many activation values tend towards zero (i.e., being inactivated). We argue that this could restrict the efficient exploration of model representation space. To mitigate this issue, we propose Finedeep, a deep-layered fine-grained expert architecture for dense models. Our framework partitions the feed-forward neural network layers of traditional dense models into small experts, arranges them across multiple sub-layers. A novel routing mechanism is proposed to determine each expert's contribution. We conduct extensive experiments across various model sizes, demonstrating that our approach significantly outperforms traditional dense architectures in terms of perplexity and benchmark performance while maintaining a comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMicrofluidic and Capillary Electrophoresis Applications · Innovative Microfluidic and Catalytic Techniques Innovation
