Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning
Yunshuo Tian, Akayou Kitessa, Tanuja Chitnis, and Yijun Zhao

TL;DR
This paper presents a reciprocal co-training framework coupling large language models with Random Forest classifiers via reinforcement learning, enabling mutual improvement despite their different paradigms.
Contribution
It introduces a novel iterative feedback mechanism that allows gradient-based and non-differentiable models to enhance each other through reinforcement learning.
Findings
Consistent performance improvements on medical datasets.
Strong enhancement of LLM capabilities through co-training.
Ablation studies highlight the importance of iterative refinement and hybrid rewards.
Abstract
Large language models (LLMs) and classical machine learning methods offer complementary strengths for predictive modeling, yet their fundamentally different representations and training paradigms hinder effective integration: LLMs rely on gradient-based optimization over textual data, whereas models such as Random Forests (RF) employ non-differentiable feature partitioning. This work introduces a reciprocal co-training framework that couples an LLM with an RF classifier via reinforcement learning, creating an iterative feedback loop in which each model improves using signals from the other. Tabular data are reformulated into standardized textual representations for the LLM, whose embeddings augment the RF feature space, while calibrated RF probability estimates provide feedback signals that guide reinforcement learning updates of the LLM. Experiments across three medical datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
