Loading paper
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers | Tomesphere