Loading paper
What Language Model to Train if You Have One Million GPU Hours? | Tomesphere