Loading paper
DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information | Tomesphere