Loading paper
Efficient Training of Language Models with Compact and Consistent Next Token Distributions | Tomesphere