Loading paper
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling | Tomesphere