Loading paper
Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning | Tomesphere