Loading paper
VHASR: A Multimodal Speech Recognition System With Vision Hotwords | Tomesphere