Loading paper
Transformer tricks: Precomputing the first layer | Tomesphere