Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning

Senjie Jin; Lu Chen; Zhiheng Xi; Yuhui Wang; Sirui Song; Yuhao Zhou; Xinbo Zhang; Peng Sun; Hong Lu; Tao Gui; Qi Zhang; Xuanjing Huang

arXiv:2510.25310·cs.CL·October 30, 2025

Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning

Senjie Jin, Lu Chen, Zhiheng Xi, Yuhui Wang, Sirui Song, Yuhao Zhou, Xinbo Zhang, Peng Sun, Hong Lu, Tao Gui, Qi Zhang, Xuanjing Huang

PDF

1 Video

TL;DR

This paper introduces Parrot, a training pipeline that simultaneously enhances both program and natural language chain-of-thought reasoning in large language models, leading to significant performance improvements.

Contribution

The paper proposes a novel training pipeline with integrated subtasks and strategies for mutual enhancement of P-CoT and N-CoT in mathematical reasoning tasks.

Findings

01

Parrot significantly improves N-CoT and P-CoT performance.

02

N-CoT accuracy gains of over 21 points on MathQA for LLaMA2 and CodeLLaMA.

03

The approach outperforms resource-intensive RL baselines.

Abstract

Natural language chain-of-thought (N-CoT) and Program chain-of-thought (P-CoT) have emerged as two primary paradigms for large language models (LLMs) to solve mathematical reasoning problems. Current research typically endeavors to achieve unidirectional enhancement: P-CoT enhanced N-CoT or N-CoT enhanced P-CoT. In this paper, we seek to fully unleash the two paradigms' strengths for mutual enhancement and ultimately achieve simultaneous improvements. We conduct a detailed analysis of the error types across two paradigms, based on which we propose Parrot, a novel training pipeline for mathematical problems: 1) Three target-designed subtasks integrate sequential P-CoT and N-CoT generation. 2) A subtask hybrid training strategy to facilitate natural language semantic transferability. 3) The converted N-CoT auxiliary reward is designed to alleviate the sparse rewards in P-CoT optimization.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning· underline