Loading paper
Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel Data | Tomesphere