Loading paper
TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection | Tomesphere