Loading paper
Multimodal Token Fusion for Vision Transformers | Tomesphere