Loading paper
Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion | Tomesphere