Loading paper
Two Heads Are Better than One: Simulating Large Transformers with Small Ones | Tomesphere