Loading paper
Multi-Headed Transformer Architectures as Time-dependent Wasserstein Gradient Flows | Tomesphere