GrootVL: Tree Topology is All You Need in State Space Model
Yicheng Xiao, Lin Song, Shaoli Huang, Jiangshan Wang, Siyu Song,, Yixiao Ge, Xiu Li, Ying Shan

TL;DR
GrootVL introduces a dynamic tree topology in state space models to improve long-range dependency modeling, achieving superior performance on visual and textual tasks with efficient computation.
Contribution
The paper proposes GrootVL, a novel state space model with dynamic tree topology generation and a linear complexity algorithm for enhanced long-range interactions.
Findings
Outperforms existing structured state space models on image tasks
Improves textual task performance with minimal fine-tuning cost
Achieves strong representation capabilities across modalities
Abstract
The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency. However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree topology based on spatial relationships and input features. Then, feature propagation is performed based on this graph, thereby breaking the original sequence constraints to achieve stronger representation capabilities. Additionally, we introduce a linear complexity dynamic programming algorithm to enhance long-range interactions without increasing computational cost. GrootVL is a versatile multimodal framework that can be applied to both visual and textual tasks. Extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention
