Loading paper
One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks | Tomesphere