Loading paper
Leveraging Vision-Language Foundation Models for Fine-Grained Downstream Tasks | Tomesphere