Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

Bochen Lyu; Yiyang Jia; Xiaohao Cai; Zhanxing Zhu

arXiv:2511.17852·cs.LG·November 25, 2025

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

Bochen Lyu, Yiyang Jia, Xiaohao Cai, Zhanxing Zhu

PDF

Open Access

TL;DR

This paper provides a theoretical comparison of reinforcement learning and supervised fine-tuning in transformers, showing they learn sparse Boolean functions differently, with implications for understanding Chain-of-Thought reasoning.

Contribution

It offers the first theoretical analysis of how RL and SFT enable transformers to learn sparse Boolean functions, revealing their distinct learning dynamics.

Findings

01

RL learns the entire CoT chain simultaneously

02

SFT learns the CoT chain step-by-step

03

Both approaches can learn k-sparse Boolean functions under certain conditions

Abstract

Transformers can acquire Chain-of-Thought (CoT) capabilities to solve complex reasoning tasks through fine-tuning. Reinforcement learning (RL) and supervised fine-tuning (SFT) are two primary approaches to this end, yet their underlying mechanisms and differences remain theoretically unclear. In this work, we examine these aspects specifically for learning $k$ -sparse Boolean functions with a one-layer transformer and intermediate supervision that is akin to CoT. In particular, we consider $k$ -sparse Boolean functions that can be recursively decomposed into fixed 2-sparse Boolean functions. We analyze the learning dynamics of fine-tuning the transformer via either RL or SFT with CoT to identify sufficient conditions for it to provably learn these functions. We verify that these conditions hold for three basic examples, including $k$ -PARITY, $k$ -AND, and $k$ -OR, thus demonstrating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Quantum Computing Algorithms and Architecture