PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play

Roger Creus Castanyer; Geoffrey Bradway; Lorenz Wolf; Maxwill Lin; Augustine N. Mavor-Parker; Matthew James Sargent

arXiv:2605.16727·cs.AI·May 19, 2026

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play

Roger Creus Castanyer, Geoffrey Bradway, Lorenz Wolf, Maxwill Lin, Augustine N. Mavor-Parker, Matthew James Sargent

PDF

TL;DR

PopuLoRA is a population-based self-play framework for reinforcement learning with LLMs, enabling co-evolution of problem difficulty and solution capabilities, leading to improved performance on reasoning and coding benchmarks.

Contribution

It introduces a novel population-based asymmetric self-play method with LoRA weight-space evolution operators for training LLMs in a co-evolutionary setting.

Findings

01

Population outperforms single-agent baseline on multiple benchmarks.

02

Co-evolution leads to increasingly complex problems and diverse solutions.

03

Weakest population member surpasses baseline performance on aggregate.

Abstract

We introduce PopuLoRA, a population-based asymmetric self-play framework for reinforcement learning with verifiable rewards (RLVR) post-training of LLMs. Teachers and students are specialised LoRA adapters on a shared frozen base: teachers propose problems, matched students solve them under a programmatic verifier, and cross-evaluation between sub-populations replaces the self-calibration that limits single-agent self-play. A family of LoRA weight-space evolution operators (mutations and crossovers that produce same-rank population members in seconds) serves as the replacement step of a population-based training loop at 7B scale. We instantiate PopuLoRA on top of Absolute Zero Reasoner and compare it against a per-adapter compute-matched single-agent baseline. Where the single agent self-calibrates to generating easy problems it can reliably solve, the population enters a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.