Loading paper
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges | Tomesphere