Loading paper
Hierarchical vs. Flat Iteration in Shared-Weight Transformers | Tomesphere