Loading paper
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | Tomesphere