DenseMamba: State Space Models with Dense Hidden Connection for   Efficient Large Language Models

Wei He; Kai Han; Yehui Tang; Chengcheng Wang; Yujie Yang; Tianyu Guo,; Yunhe Wang

arXiv:2403.00818·cs.CL·March 6, 2024·5 cites

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo,, Yunhe Wang

PDF

Open Access 1 Repo 2 Models

TL;DR

DenseMamba introduces DenseSSM, a novel architecture that enhances state space models with dense hidden connections, significantly improving performance of large language models while maintaining efficiency.

Contribution

The paper proposes DenseSSM, a new method that improves state space models by adding dense hidden connections, leading to better performance without sacrificing efficiency.

Findings

01

DenseRetNet outperforms original RetNet by up to 5% accuracy.

02

DenseSSM maintains training parallelizability and inference efficiency.

03

Applicable to various SSM types like RetNet and Mamba.

Abstract

Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture. While state space model (SSM) is a new type of foundational network architecture offering lower computational complexity, their performance has yet to fully rival that of Transformers. This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in SSMs. By selectively integrating shallowlayer hidden states into deeper layers, DenseSSM retains fine-grained information crucial for the final output. Dense connections enhanced DenseSSM still maintains the training parallelizability and inference efficiency. The proposed method can be widely applicable to various SSM types like RetNet and Mamba. With similar model size, DenseSSM achieves significant improvements, exemplified by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wailordhe/densessm
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Dropout · Multi-Head Attention · Softmax · Dense Connections · Label Smoothing · Adam