CORE: Collaborative Reasoning via Cross Teaching
Kshitij Mishra, Mirat Aubakirov, Martin Takac, Nils Lukas, Salem Lahlou

TL;DR
CORE introduces a collaborative training framework for large language models that leverages peer success signals and diversity to significantly improve reasoning accuracy with minimal training data.
Contribution
The paper proposes a novel cross-teaching protocol that enhances reasoning performance by converting peer success into learning signals during training.
Findings
Achieves over 99% Pass@2 on GSM8K with only 1,000 examples.
Significantly outperforms single-model training on multiple reasoning datasets.
Effective with small models and limited training tokens.
Abstract
Large language models exhibit complementary reasoning errors: on the same instance, one model may succeed with a particular decomposition while another fails. We propose Collaborative Reasoning (CORE), a training-time collaboration framework that converts peer success into a learning signal via a cross-teaching protocol. Each problem is solved in two stages: a cold round of independent sampling, followed by a contexted rescue round in which models that failed receive hint extracted from a successful peer. CORE optimizes a combined reward that balances (i) correctness, (ii) a lightweight DPP-inspired diversity term to reduce error overlap, and (iii) an explicit rescue bonus for successful recovery. We evaluate CORE across four standard reasoning datasets GSM8K, MATH, AIME, and GPQA. With only 1,000 training examples, a pair of small open source models (3B+4B) reaches Pass@2 of 99.54% on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
