RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States

Xiangjie Xiao; Cong Zhang; Wen Song; Zhiguang Cao

arXiv:2603.07020·cs.LG·March 10, 2026

RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States

Xiangjie Xiao, Cong Zhang, Wen Song, Zhiguang Cao

PDF

Open Access 3 Reviews

TL;DR

ReSched introduces a simplified, Transformer-based deep reinforcement learning framework for flexible job shop scheduling, reducing state complexity and improving generalization across related scheduling problems.

Contribution

It redefines the FJSP formulation with minimal features and employs Transformer architecture, advancing a more generalizable and less complex scheduling framework.

Findings

01

Outperforms classical dispatching rules and state-of-the-art DRL methods on FJSP.

02

Generalizes effectively to JSSP and FFSP with competitive results.

03

Reduces state representation complexity to four features.

Abstract

Neural approaches to the Flexible Job Shop Scheduling Problem (FJSP), particularly those based on deep reinforcement learning (DRL), have gained growing attention in recent years. However, existing methods rely on complex feature-engineered state representations (i.e., often requiring more than 20 handcrafted features) and graph-biased neural architectures. To reduce modeling complexity and advance a more generalizable framework for FJSP, we introduce \textsc{ReSched}, a minimalist DRL framework that rethinks both the scheduling formulation and model design. First, by revisiting the Markov Decision Process (MDP) formulation of FJSP, we condense the state space to just four essential features, eliminating historical dependencies through a subproblem-based perspective. Second, we employ Transformer blocks with dot-product attention, augmented by three lightweight but effective…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The paper has good results on large instances. A new state representation and architecture are proposed.

Weaknesses

The writing is not polished (Experments, Dateset, ...) The training times of the different DRL approaches are not compared. It would be better to also have OR-TOOLS on Taillard benchmark.

Reviewer 02Rating 6Confidence 4

Strengths

The paper shows that strong performance is possible with a small, domain-aware feature set. Structure-aware Transformer well aligned with precedence and operation-machine eligibility. It demonstrates sample efficiency and generalization across instance sizes in FJSP, one of the most challenging COPs.

Weaknesses

The “minimal MDP” claim (proposition 1) lacks a formal proof; it relies mainly on empirical results. Related work on feature minimization is incomplete: Lee & Kim (2024) already aimed to remove historical dependencies and relative time feature (global minimum available time subtraction) in JSSP; conceptually aligned with “State: SubProblem” design (line 229 in this paper). Authors analyze only DANIEL (Wang et al., 2024b) features in FJSP, but feature minimization in JSSP had already been explor

Reviewer 03Rating 6Confidence 4

Strengths

- The paper strikes a balanced narrative between reviewing foundational concepts of the FJSSP and introducing the method. Explanations are concise and accessible, enabling readers to follow both theoretical background and model design decisions. - Although the transformer architecture itself is not new, the authors extend it with relevant innovations (such as rotary positional encodings) and tailor the input representation to the scheduling context. Further, the authors introduces a deliberate

Weaknesses

- The experimental evaluation lacks a comparison of computation time between the proposed method and other neural network–based approaches. This is particularly relevant because the method supports sampling-based inference, which may introduce higher computational cost. A runtime analysis (such as wall-clock time per episode or inference step) would clarify the trade-off between solution quality and computational efficiency and help position the method against deep learning baselines. - In Figu

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScheduling and Optimization Algorithms · Smart Grid Energy Management · Advanced Queuing Theory Analysis