Loading paper
Variational Stacked Local Attention Networks for Diverse Video Captioning | Tomesphere