Loading paper
Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning | Tomesphere