MCSD: An Efficient Language Model with Diverse Fusion

Hua Yang; Duohai Li; Shiman Li

arXiv:2406.12230·cs.CL·July 12, 2024

MCSD: An Efficient Language Model with Diverse Fusion

Hua Yang, Duohai Li, Shiman Li

PDF

Open Access

TL;DR

The paper introduces MCSD, a resource-efficient language model with linear scaling and diverse feature fusion, achieving high throughput and low memory usage while maintaining competitive performance for edge applications.

Contribution

We propose MCSD, a novel language model with a multi-channel slope and decay block for diverse feature fusion, enabling efficient inference with linear complexity.

Findings

01

MCSD achieves higher throughput than Transformers.

02

MCSD uses less GPU memory while maintaining performance.

03

MCSD maintains comparable accuracy to larger models on benchmarks.

Abstract

Transformers excel in Natural Language Processing (NLP) due to their prowess in capturing long-term dependencies but suffer from exponential resource consumption with increasing sequence lengths. To address these challenges, we propose MCSD model, an efficient language model with linear scaling and fast inference speed. MCSD model leverages diverse feature fusion, primarily through the multi-channel slope and decay (MCSD) block, to robustly represent features. This block comprises slope and decay sections that extract features across diverse temporal receptive fields, facilitating capture of both local and global information. In addition, MCSD block conducts element-wise fusion of diverse features to further enhance the delicate feature extraction capability. For inference, we formulate the inference process into a recurrent representation, slashing space complexity to $O (1)$ and time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsBalanced Selection