MCSD: An Efficient Language Model with Diverse Fusion
Hua Yang, Duohai Li, Shiman Li

TL;DR
The paper introduces MCSD, a resource-efficient language model with linear scaling and diverse feature fusion, achieving high throughput and low memory usage while maintaining competitive performance for edge applications.
Contribution
We propose MCSD, a novel language model with a multi-channel slope and decay block for diverse feature fusion, enabling efficient inference with linear complexity.
Findings
MCSD achieves higher throughput than Transformers.
MCSD uses less GPU memory while maintaining performance.
MCSD maintains comparable accuracy to larger models on benchmarks.
Abstract
Transformers excel in Natural Language Processing (NLP) due to their prowess in capturing long-term dependencies but suffer from exponential resource consumption with increasing sequence lengths. To address these challenges, we propose MCSD model, an efficient language model with linear scaling and fast inference speed. MCSD model leverages diverse feature fusion, primarily through the multi-channel slope and decay (MCSD) block, to robustly represent features. This block comprises slope and decay sections that extract features across diverse temporal receptive fields, facilitating capture of both local and global information. In addition, MCSD block conducts element-wise fusion of diverse features to further enhance the delicate feature extraction capability. For inference, we formulate the inference process into a recurrent representation, slashing space complexity to and time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsBalanced Selection
