Human-in-the-Loop Testing of AI Agents for Air Traffic Control with a Regulated Assessment Framework
Ben Carvell, Marc Thomas, Andrew Pace, Christopher Dorney, George De Ath, Richard Everson, Nick Pepper, Adam Keane, Samuel Tomlinson, Richard Cannon

TL;DR
This paper introduces a human-in-the-loop evaluation framework for AI agents in Air Traffic Control, utilizing a regulator-certified simulator to ensure realistic assessment aligned with real-world standards.
Contribution
It presents a novel, regulator-based assessment framework involving human experts, bridging the gap between academic models and real-world operational environments.
Findings
Framework enables authentic performance measurement
Involves expert human instructors in evaluation
Aligns AI assessment with real-world standards
Abstract
We present a rigorous, human-in-the-loop evaluation framework for assessing the performance of AI agents on the task of Air Traffic Control, grounded in a regulator-certified simulator-based curriculum used for training and testing real-world trainee controllers. By leveraging legally regulated assessments and involving expert human instructors in the evaluation process, our framework enables a more authentic and domain-accurate measurement of AI performance. This work addresses a critical gap in the existing literature: the frequent misalignment between academic representations of Air Traffic Control and the complexities of the actual operational environment. It also lays the foundations for effective future human-machine teaming paradigms by aligning machine performance with human assessment targets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman-Automation Interaction and Safety · Air Traffic Management and Optimization · Aerospace and Aviation Technology
