Loading paper
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? | Tomesphere