Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs
Georgios Tzannetos, Parameswaran Kamalaruban, Adish Singla

TL;DR
This paper introduces a curriculum learning approach that gradually enforces trajectory constraints during training, improving agent performance and enabling token compression in LLMs for faster inference under resource limits.
Contribution
We propose a novel curriculum strategy for training constrained agents that accelerates learning and enables token compression in LLMs, with theoretical and empirical validation across multiple domains.
Findings
Accelerates RL training under trajectory constraints.
Enables significant token compression in LLMs.
Improves inference speed on consumer hardware.
Abstract
Training agents to operate under strict constraints during deployment, such as limited resource budgets or stringent safety requirements, presents significant challenges, especially when these constraints render the task complex. In this work, we propose a curriculum learning strategy that gradually tightens constraints during training, enabling the agent to incrementally master the deployment requirements. Inspired by self-paced learning techniques in unconstrained reinforcement learning (RL), our approach facilitates a smoother transition to challenging environments by initially training on simplified versions of the constraints and progressively introducing the full deployment conditions. We provide a theoretical analysis using an RL agent in a binary-tree Markov Decision Process (MDP) to demonstrate that our curriculum strategy can accelerate training relative to a baseline approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
