Loading paper
Automatic benchmarking of large multimodal models via iterative experiment programming | Tomesphere