Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning

Yunshuo Tian; Akayou Kitessa; Tanuja Chitnis; and Yijun Zhao

arXiv:2604.16378·cs.CL·April 21, 2026

Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning

Yunshuo Tian, Akayou Kitessa, Tanuja Chitnis, and Yijun Zhao

PDF

TL;DR

This paper presents a reciprocal co-training framework coupling large language models with Random Forest classifiers via reinforcement learning, enabling mutual improvement despite their different paradigms.

Contribution

It introduces a novel iterative feedback mechanism that allows gradient-based and non-differentiable models to enhance each other through reinforcement learning.

Findings

01

Consistent performance improvements on medical datasets.

02

Strong enhancement of LLM capabilities through co-training.

03

Ablation studies highlight the importance of iterative refinement and hybrid rewards.

Abstract

Large language models (LLMs) and classical machine learning methods offer complementary strengths for predictive modeling, yet their fundamentally different representations and training paradigms hinder effective integration: LLMs rely on gradient-based optimization over textual data, whereas models such as Random Forests (RF) employ non-differentiable feature partitioning. This work introduces a reciprocal co-training framework that couples an LLM with an RF classifier via reinforcement learning, creating an iterative feedback loop in which each model improves using signals from the other. Tabular data are reformulated into standardized textual representations for the LLM, whose embeddings augment the RF feature space, while calibrated RF probability estimates provide feedback signals that guide reinforcement learning updates of the LLM. Experiments across three medical datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.