Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts
Toshihide Ubukata, Zhiyao Wang, Enhong Mu, Jialong Li, Kenji Tei

TL;DR
This paper introduces a Soft Mixture-of-Experts framework for reinforcement learning-based directed controller synthesis, significantly enhancing robustness and expanding the solvable parameter space in complex systems.
Contribution
It proposes a novel Soft Mixture-of-Experts approach with a gating mechanism to address anisotropic generalization in RL for controller synthesis.
Findings
Substantially expands the solvable parameter space.
Improves robustness over single-expert RL methods.
Effective on the Air Traffic benchmark.
Abstract
On-the-fly Directed Controller Synthesis (OTF-DCS) mitigates state-space explosion by incrementally exploring the system and relies critically on an exploration policy to guide search efficiently. Recent reinforcement learning (RL) approaches learn such policies and achieve promising zero-shot generalization from small training instances to larger unseen ones. However, a fundamental limitation is anisotropic generalization, where an RL policy exhibits strong performance only in a specific region of the domain-parameter space while remaining fragile elsewhere due to training stochasticity and trajectory-dependent bias. To address this, we propose a Soft Mixture-of-Experts framework that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations. The evaluation on the Air Traffic benchmark shows that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAir Traffic Management and Optimization · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
