Loading paper
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer | Tomesphere