Loading paper
One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer | Tomesphere