Loading paper
LocCa: Visual Pretraining with Location-aware Captioners | Tomesphere