Loading paper
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation | Tomesphere