Loading paper
Text-Derived Knowledge Helps Vision: A Simple Cross-modal Distillation for Video-based Action Anticipation | Tomesphere