Loading paper
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy? | Tomesphere