DeMansia: Mamba Never Forgets Any Tokens
Ricky Fang

TL;DR
DeMansia introduces a novel transformer-based architecture that combines state space models and token labeling to improve image classification efficiency and performance, addressing limitations of traditional transformers in handling long sequences.
Contribution
This paper presents DeMansia, a new architecture that integrates state space models with token labeling, advancing transformer efficiency for image classification tasks.
Findings
DeMansia outperforms existing models on benchmark datasets.
It effectively handles long sequences with reduced computational cost.
The architecture demonstrates superior accuracy compared to traditional transformers.
Abstract
This paper examines the mathematical foundations of transformer architectures, highlighting their limitations particularly in handling long sequences. We explore prerequisite models such as Mamba, Vision Mamba (ViM), and LV-ViT that pave the way for our proposed architecture, DeMansia. DeMansia integrates state space models with token labeling techniques to enhance performance in image classification tasks, efficiently addressing the computational challenges posed by traditional transformers. The architecture, benchmark, and comparisons with contemporary models demonstrate DeMansia's effectiveness. The implementation of this paper is available on GitHub at https://github.com/catalpaaa/DeMansia
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAfrican history and culture studies
MethodsLV-ViT · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
