Loading paper
Bridging the Gap between Audio and Text using Parallel-attention for User-defined Keyword Spotting | Tomesphere