Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback

Wenzhen Yuan; Shengji Tang; Weihao Lin; Jiacheng Ruan; Ganqu Cui; Bo Zhang; Tao Chen; Ting Liu; Yuzhuo Fu; Peng Ye; Lei Bai

arXiv:2508.12338·cs.AI·August 19, 2025

Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback

Wenzhen Yuan, Shengji Tang, Weihao Lin, Jiacheng Ruan, Ganqu Cui, Bo Zhang, Tao Chen, Ting Liu, Yuzhuo Fu, Peng Ye, Lei Bai

PDF

Open Access

TL;DR

This paper introduces RLCCF, a collaborative reinforcement learning framework where multiple language models coevolve through collective feedback, significantly improving reasoning accuracy without external supervision.

Contribution

RLCCF is a novel multi-model coevolutionary RL framework that enhances collective reasoning by optimizing ensemble consistency and weighting models by self-confidence.

Findings

01

Achieves an average 16.72% accuracy improvement across benchmarks.

02

Enhances group voting accuracy by 4.51%.

03

Demonstrates effective coevolution of diverse LLMs.

Abstract

Reinforcement learning (RL) has significantly enhanced the reasoning capabilities of large language models (LLMs), but its reliance on expensive human-labeled data or complex reward models severely limits scalability. While existing self-feedback methods aim to address this problem, they are constrained by the capabilities of a single model, which can lead to overconfidence in incorrect answers, reward hacking, and even training collapse. To this end, we propose Reinforcement Learning from Coevolutionary Collective Feedback (RLCCF), a novel RL framework that enables multi-model collaborative evolution without external supervision. Specifically, RLCCF optimizes the ability of a model collective by maximizing its Collective Consistency (CC), which jointly trains a diverse ensemble of LLMs and provides reward signals by voting on collective outputs. Moreover, each model's vote is weighted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOpinion Dynamics and Social Influence · Evolutionary Game Theory and Cooperation