Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Tong Wu; Yang Liu; Jun Bai; Zixia Jia; Shuyi Zhang; Ziyong Lin; Yanting Wang; Song-Chun Zhu; Zilong Zheng

arXiv:2512.07461·cs.CL·May 15, 2026

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Tong Wu, Yang Liu, Jun Bai, Zixia Jia, Shuyi Zhang, Ziyong Lin, Yanting Wang, Song-Chun Zhu, Zilong Zheng

PDF

1 Repo 2 Models

TL;DR

The paper presents NPR, a novel framework enabling large language models to perform genuine parallel reasoning through self-distillation and reinforcement learning, achieving significant speedups and performance improvements.

Contribution

NPR introduces a self-distilled training paradigm and a parallel policy optimization algorithm for scalable, parallel reasoning in language models without external supervision.

Findings

01

Achieves up to 24.5% performance improvements on reasoning benchmarks.

02

Realizes up to 4.6x inference speedup with parallel execution.

03

Demonstrates 100% genuine parallel reasoning unlike autoregressive baselines.

Abstract

We introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities. NPR transforms the model from sequential emulation to native parallel cognition through three key innovations: 1) a self-distilled progressive training paradigm that transitions from ``cold-start'' format discovery to strict topological constraints without external supervision; 2) a novel Parallel-Aware Policy Optimization (PAPO) algorithm that optimizes branching policies directly within the execution graph, allowing the model to learn adaptive decomposition via trial and error; and 3) a robust NPR Engine that refactors memory management and flow control of SGLang to enable stable, large-scale parallel RL training. Across eight reasoning benchmarks, NPR trained on Qwen3-4B achieves performance gains of up to 24.5% and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bigai-nlco/Native-Parallel-Reasoner
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.