Loading paper
Fine-Grained Grounding for Multimodal Speech Recognition | Tomesphere