A Reinforcement Learning Approach to the Stochastic Cutting Stock Problem
Anselmo R. Pitombeira-Neto, Arthur H. Fonseca Murta

TL;DR
This paper models the stochastic cutting stock problem as a Markov decision process and introduces a reinforcement learning heuristic to derive cost-effective inventory policies, demonstrating significant improvements over myopic strategies.
Contribution
It develops a novel reinforcement learning-based heuristic for the stochastic cutting stock problem, addressing scalability issues of exact algorithms.
Findings
Reinforcement learning policies reduce costs by up to 80% compared to myopic policies.
Approximate policy iteration with linear models effectively manages large decision spaces.
Simulation-based evaluation demonstrates practical applicability with realistic data.
Abstract
We propose a formulation of the stochastic cutting stock problem as a discounted infinite-horizon Markov decision process. At each decision epoch, given current inventory of items, an agent chooses in which patterns to cut objects in stock in anticipation of the unknown demand. An optimal solution corresponds to a policy that associates each state with a decision and minimizes the expected total cost. Since exact algorithms scale exponentially with the state-space dimension, we develop a heuristic solution approach based on reinforcement learning. We propose an approximate policy iteration algorithm in which we apply a linear model to approximate the action-value function of a policy. Policy evaluation is performed by solving the projected Bellman equation from a sample of state transitions, decisions and costs obtained by simulation. Due to the large decision space, policy improvement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
