Loading paper
Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search | Tomesphere