Loading paper
Can We Predict Performance of Large Models across Vision-Language Tasks? | Tomesphere