Learning to Think from Multiple Thinkers

Nirmit Joshi; Roey Magen; Nathan Srebro; Nikolaos Tsilivis; Gal Vardi

arXiv:2604.24737·cs.LG·April 28, 2026

Learning to Think from Multiple Thinkers

Nirmit Joshi, Roey Magen, Nathan Srebro, Nikolaos Tsilivis, Gal Vardi

PDF

TL;DR

This paper explores learning from multiple Thinkers providing correct but diverse solutions, establishing computational hardness under cryptographic assumptions and proposing an efficient active learning algorithm.

Contribution

It introduces a framework for learning from multiple Thinkers with diverse solutions, analyzing computational hardness and offering an active learning method that scales efficiently.

Findings

01

Learning can be computationally hard from multiple Thinkers under cryptographic assumptions.

02

An active learning algorithm can efficiently learn with limited CoT data per Thinker.

03

The method scales logarithmically with the inverse of the target error.

Abstract

We study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a small amount of CoT data per thinker that is completely independent of the target accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.