Loading paper
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining | Tomesphere