Loading paper
Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators | Tomesphere