# AF-CuRL: Stable Reinforcement Learning for Resource-Constrained Long-Form Reasoning in Edge-Intelligent Systems

**Authors:** Ziqin Yan, Yurong Wang, Qingsheng Yue, Xiaojiang Wang

PMC · DOI: 10.3390/s26051433 · Sensors (Basel, Switzerland) · 2026-02-25

## TL;DR

AF-CuRL is a lightweight reinforcement learning framework that improves stable long-form reasoning in low-resource edge systems without increasing model size or compute.

## Contribution

AF-CuRL introduces answer-focused token reweighting and a curriculum reward schedule for stable long-form reasoning in resource-constrained settings.

## Key findings

- AF-CuRL improves mathematical reasoning accuracy and output regularity on a 1.5B-parameter model.
- The framework achieves stable training without increasing model size or computational cost.
- Structured objective design outperforms model scaling for long-form reasoning in low-resource systems.

## Abstract

What are the main findings?
Proposes AF-CuRL, a lightweight reinforcement learning framework that improves training stability for long-form generation under low-resource constraints.Demonstrates consistent gains in mathematical reasoning accuracy and output regularity on a 1.5B-parameter model without increasing model size or compute.

Proposes AF-CuRL, a lightweight reinforcement learning framework that improves training stability for long-form generation under low-resource constraints.

Demonstrates consistent gains in mathematical reasoning accuracy and output regularity on a 1.5B-parameter model without increasing model size or compute.

What are the implications of the main findings?
Shows that objective-level design, rather than model scaling, is critical for effective reinforcement learning in low-resource long-form generation.Provides a practical and reproducible reinforcement learning approach applicable to resource-constrained reasoning tasks.

Shows that objective-level design, rather than model scaling, is critical for effective reinforcement learning in low-resource long-form generation.

Provides a practical and reproducible reinforcement learning approach applicable to resource-constrained reasoning tasks.

Resource-constrained intelligent systems increasingly require reliable long-form reasoning capabilities under limited computational and memory budgets, particularly in edge and embedded sensing environments. However, reinforcement learning for long-horizon decision generation remains highly unstable in such low-resource settings due to severe reward sparsity and imbalanced credit assignment, which often lead to non-convergent or excessively verbose generation behavior. In this work, we propose AF-CuRL (Answer-Focused Curriculum Reinforcement Learning), a lightweight reinforcement learning framework designed to stabilize long-form generation without increasing model size or computational cost. AF-CuRL improves optimization learnability through two complementary objective-level designs: (1) answer-focused token reweighting, which concentrates policy updates on reward-critical regions of generated sequences to alleviate credit assignment imbalance, and (2) a two-phase curriculum reward schedule that prioritizes stable termination and output regularity before shifting toward correctness-oriented optimization. We evaluate AF-CuRL on a 1.5B-parameter language model under strictly constrained training settings, using mathematical reasoning tasks as a controlled and reproducible proxy for long-horizon, rule-based decision-making commonly encountered in intelligent sensing and embedded systems. Experimental results demonstrate consistent improvements in both decision accuracy and generation regularity, including higher termination reliability and reduced generation length, compared with standard sequence-level reinforcement learning baselines. These results suggest that, for resource-limited and edge-intelligent systems, structured objective design can be more effective than model scaling for achieving stable and efficient long-form reasoning, providing a practical reinforcement learning solution for intelligent systems operating under real-world constraints.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12986881/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12986881/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12986881/full.md

---
Source: https://tomesphere.com/paper/PMC12986881