Combining AI Control Systems and Human Decision Support via Robustness and Criticality
Walt Woods, Alexander Grushin, Simon Khan, Alvaro Velasquez

TL;DR
This paper presents a robust AI control system integrated with human decision support, utilizing adversarial explanations and autoencoders to improve safety, robustness, and training in reinforcement learning frameworks like MuZero.
Contribution
It extends adversarial explanation methodology to advanced reinforcement learning, introduces autoencoders for saliency detection, and combines these with criticality analysis for improved human-AI collaboration.
Findings
AI control systems show increased robustness against adversarial tampering.
Adversarial explanations effectively aid human decision-making.
The integrated system enhances AI decision quality and safety in critical scenarios.
Abstract
AI-enabled capabilities are reaching the requisite level of maturity to be deployed in the real world, yet do not always make correct or safe decisions. One way of addressing these concerns is to leverage AI control systems alongside and in support of human decisions, relying on the AI control system in safe situations while calling on a human co-decider for critical situations. We extend a methodology for adversarial explanations (AE) to state-of-the-art reinforcement learning frameworks, including MuZero. Multiple improvements to the base agent architecture are proposed. We demonstrate how this technology has two applications: for intelligent decision tools and to enhance training / learning frameworks. In a decision support context, adversarial explanations help a user make the correct decision by highlighting those contextual factors that would need to change for a different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Batch Normalization · Average Pooling · Residual Block · Monte-Carlo Tree Search · Prioritized Experience Replay · Convolution · MuZero · Balanced Selection
