Loading paper
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization | Tomesphere