Loading paper
Contextual inference from single objects in Vision-Language models | Tomesphere