Loading paper
BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval | Tomesphere