Loading paper
Alternatives to the Scaled Dot Product for Attention in the Transformer Neural Network Architecture | Tomesphere