Online Action-Stacking Improves Reinforcement Learning Performance for Air Traffic Control
Ben Carvell, George De Ath, Eseoghene Benjamin, Richard Everson

TL;DR
This paper presents online action-stacking, a method that enhances reinforcement learning for air traffic control by producing realistic commands with a smaller action space, improving efficiency and scalability.
Contribution
Introducing online action-stacking as an inference-time wrapper that compiles primitive actions into domain-appropriate commands, enabling effective RL in complex ATC tasks with fewer actions.
Findings
Reduces instruction frequency compared to baseline
Achieves similar performance with fewer actions
Facilitates scaling to complex control scenarios
Abstract
We introduce online action-stacking, an inference-time wrapper for reinforcement learning policies that produces realistic air traffic control commands while allowing training on a much smaller discrete action space. Policies are trained with simple incremental heading or level adjustments, together with an action-damping penalty that reduces instruction frequency and leads agents to issue commands in short bursts. At inference, online action-stacking compiles these bursts of primitive actions into domain-appropriate compound clearances. Using Proximal Policy Optimisation and the BluebirdDT digital twin platform, we train agents to navigate aircraft along lateral routes, manage climb and descent to target flight levels, and perform two-aircraft collision avoidance under a minimum separation constraint. In our lateral navigation experiments, action stacking greatly reduces the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAir Traffic Management and Optimization · Aerospace and Aviation Technology · Reinforcement Learning in Robotics
