BinaryPPO: Efficient Policy Optimization for Binary Classification

Punya Syon Pandey; Zhijing Jin

arXiv:2602.02708·cs.LG·February 4, 2026

BinaryPPO: Efficient Policy Optimization for Binary Classification

Punya Syon Pandey, Zhijing Jin

PDF

Open Access

TL;DR

BinaryPPO introduces a reinforcement learning framework that reformulates binary classification as reward maximization, significantly improving accuracy over traditional supervised fine-tuning, especially in noisy or imbalanced data scenarios.

Contribution

The paper presents BinaryPPO, a novel offline RL method using reward shaping for robust binary classification, outperforming supervised methods across multiple benchmarks.

Findings

01

BinaryPPO achieves up to 99% accuracy.

02

It improves performance by 40-60 percentage points over baselines.

03

Reward shaping and policy stability are key to success.

Abstract

Supervised fine-tuning (SFT) is the standard approach for binary classification tasks such as toxicity detection, factuality verification, and causal inference. However, SFT often performs poorly in real-world settings with label noise, class imbalance, or sparse supervision. We introduce BinaryPPO, an offline reinforcement learning large language model (LLM) framework that reformulates binary classification as a reward maximization problem. Our method leverages a variant of Proximal Policy Optimization (PPO) with a confidence-weighted reward function that penalizes uncertain or incorrect predictions, enabling the model to learn robust decision policies from static datasets without online interaction. Across eight domain-specific benchmarks and multiple models with differing architectures, BinaryPPO improves accuracy by 40-60 percentage points, reaching up to 99%, substantially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning