Loading paper
Weight subcloning: direct initialization of transformers using larger pretrained ones | Tomesphere