Loading paper
Spatio-Temporal Attention Models for Grounded Video Captioning | Tomesphere