Loading paper
$C^3$: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues | Tomesphere