MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

Purbesh Mitra; Sennur Ulukus

arXiv:2507.02851·cs.CL·July 4, 2025

MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

Purbesh Mitra, Sennur Ulukus

PDF

2 Models

TL;DR

MOTIF introduces a reinforcement learning fine-tuning approach enabling large language models to perform modular, multi-round reasoning beyond their context size limits, improving accuracy efficiently.

Contribution

The paper presents MOTIF, a novel RL fine-tuning method that enhances LLM reasoning by enabling multi-round thinking over larger contexts, with improved accuracy and sample efficiency.

Findings

01

Achieved 3.8% and 3.3% accuracy improvements on benchmarks.

02

Demonstrated effective reasoning beyond context size limits.

03

Sample-efficient training with only 15% of data.

Abstract

Recent advancements in the reasoning capabilities of large language models (LLMs) show that employing group relative policy optimization (GRPO) algorithm for reinforcement learning (RL) training allows the models to use more thinking/reasoning tokens for generating better responses. However, LLMs can generate only a finite amount of tokens while maintaining attention to the previously generated tokens. This limit, also known as the context size of an LLM, is a bottleneck in LLM reasoning with arbitrarily large number of tokens. To think beyond the limit of context size, an LLM must employ a modular thinking strategy to reason over multiple rounds. In this work, we propose $MOTIF: Modular Thinking via Reinforcement Finetuning$ -- an RL training method for generating thinking tokens in multiple rounds, effectively allowing the model to think with additional context size. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.