Loading paper
Global2Local: A Joint-Hierarchical Attention for Video Captioning | Tomesphere