Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts

Toshihide Ubukata; Zhiyao Wang; Enhong Mu; Jialong Li; Kenji Tei

arXiv:2602.19244·cs.AI·February 24, 2026

Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts

Toshihide Ubukata, Zhiyao Wang, Enhong Mu, Jialong Li, Kenji Tei

PDF

Open Access

TL;DR

This paper introduces a Soft Mixture-of-Experts framework for reinforcement learning-based directed controller synthesis, significantly enhancing robustness and expanding the solvable parameter space in complex systems.

Contribution

It proposes a novel Soft Mixture-of-Experts approach with a gating mechanism to address anisotropic generalization in RL for controller synthesis.

Findings

01

Substantially expands the solvable parameter space.

02

Improves robustness over single-expert RL methods.

03

Effective on the Air Traffic benchmark.

Abstract

On-the-fly Directed Controller Synthesis (OTF-DCS) mitigates state-space explosion by incrementally exploring the system and relies critically on an exploration policy to guide search efficiently. Recent reinforcement learning (RL) approaches learn such policies and achieve promising zero-shot generalization from small training instances to larger unseen ones. However, a fundamental limitation is anisotropic generalization, where an RL policy exhibits strong performance only in a specific region of the domain-parameter space while remaining fragile elsewhere due to training stochasticity and trajectory-dependent bias. To address this, we propose a Soft Mixture-of-Experts framework that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations. The evaluation on the Air Traffic benchmark shows that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAir Traffic Management and Optimization · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics