Loading paper
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures | Tomesphere