Loading paper
Can Vision Language Models Understand Mimed Actions? | Tomesphere