Loading paper
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders | Tomesphere