Loading paper
CLIPPO: Image-and-Language Understanding from Pixels Only | Tomesphere