Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models
Xiangming Gu, Soham De, Larisa Markeeva, Petar Veli\v{c}kovi\'c, Razvan Pascanu

TL;DR
This paper compares sequential and parallel sampling strategies in large reasoning models, finding that exploration deficits in sequential sampling largely explain the performance gap despite its higher representation capacity.
Contribution
It provides a rigorous comparison of sampling strategies and identifies exploration as the key factor behind the performance difference in large reasoning models.
Findings
Parallel sampling outperforms sequential sampling despite lower representation capacity.
Aggregation and context length are not primary reasons for the performance gap.
Limited exploration in sequential sampling significantly contributes to its poorer performance.
Abstract
Large Reasoning Models (LRMs) have shown remarkable performance on challenging questions, such as math and coding. However, to obtain a high quality solution, one may need to sample more than once. In principal, there are two sampling strategies that can be composed to form more complex processes: sequential sampling and parallel sampling. In this paper, we first compare these two approaches with rigor, and observe, aligned with previous works, that parallel sampling seems to outperform sequential sampling even though the latter should have more representation power. To understand the underline reasons, we make three hypothesis on the reason behind this behavior: (i) parallel sampling outperforms due to the aggregator operator; (ii) sequential sampling is harmed by needing to use longer contexts; (iii) sequential sampling leads to less exploration due to conditioning on previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
