Loading paper
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset | Tomesphere