Loading paper
Entity-aware and Motion-aware Transformers for Language-driven Action Localization in Videos | Tomesphere