Loading paper
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization | Tomesphere