Loading paper
Speech-Image Semantic Alignment Does Not Depend on Any Prior Classification Tasks | Tomesphere