Loading paper
Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization | Tomesphere