Loading paper
Jointly Localizing and Describing Events for Dense Video Captioning | Tomesphere