Loading paper
Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents | Tomesphere