Loading paper
Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models | Tomesphere