Loading paper
LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport | Tomesphere