Loading paper
How transformers learn structured data: insights from hierarchical filtering | Tomesphere