Loading paper
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks | Tomesphere