Loading paper
Vocabulary-free Fine-grained Visual Recognition via Enriched Contextually Grounded Vision-Language Model | Tomesphere