Loading paper
Grouped self-attention mechanism for a memory-efficient Transformer | Tomesphere