CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

Dachuan Shi; Hanlin Zhu; Xiangchi Yuan; Wanjia Zhao; Kejing Xia; Wen Xiao; Wenke Lee

arXiv:2605.20075·cs.CL·May 20, 2026

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

Dachuan Shi, Hanlin Zhu, Xiangchi Yuan, Wanjia Zhao, Kejing Xia, Wen Xiao, Wenke Lee

PDF

1 Repo

TL;DR

CopT introduces a novel reasoning approach that generates an initial answer before on-policy reflection, improving accuracy and efficiency in large language models without extra training.

Contribution

It reformulates reasoning as answer-first with on-policy reflection, using contrastive verifiers to assess answer reliability, enhancing performance and reducing token costs.

Findings

01

Improves peak accuracy by up to 23% across tasks.

02

Reduces token usage by up to 57%.

03

Operates without additional training.

Abstract

Chain-of-thought (CoT) is a standard approach for eliciting reasoning capabilities from large language models (LLMs). However, the common CoT paradigm treats thinking as a prerequisite for answering, which can delay access to plausible answers and incur unnecessary token costs even when the model is able to identify an answer before extended thinking, a behavior known as performative reasoning. In this paper, we introduce CopT, a reformulated reasoning pipeline that reverses the usual order of thinking and answering. Instead of thinking before answering, CopT first elicits a draft answer and then invokes subsequent on-policy thinking conditioned on its own draft answer for reflection and correction. To assess whether the draft answer should be trusted, CopT recasts continuous embeddings as inference-time contrastive verifiers. Specifically, it contrasts the model's support for the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sdc17/CopT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.