Loading paper
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning | Tomesphere