Loading paper
MINOTAUR: Multi-task Video Grounding From Multimodal Queries | Tomesphere