Loading paper
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos | Tomesphere