Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System
Romeo Valentin, Olivia Beyer Bruvik, Marc R. Schlichting, Mykel J. Kochenderfer

TL;DR
This paper proposes a mechanistic interpretability approach for vision-based aircraft landing systems, demonstrating how content and style separation in neural network representations can provide concrete safety assurance evidence.
Contribution
It introduces a method to decompose neural network embeddings into interpretable atoms, enabling runtime assurance through content/style separation and out-of-model-scope detection.
Findings
Contentful atoms track task-relevant runway structures
Stylistic atoms capture domain-specific appearance
Regression head relies mainly on contentful atoms
Abstract
EASA's learning-assurance guidance requires data-driven aviation systems to build and monitor their own situation representation, yet for neural networks the technical means to provide such evidence remain an open problem. We address this gap for a vision-based aircraft landing system: we propose that a minimally assurable model must at least be shown to separate content from style in its own situation representation. Showing that the model's predictions then rely largely on the contentful representation components leads to a concrete assurance path. To demonstrate this assurance path on a concrete model we train a vision transformer model for runway keypoint regression on the LARDv2 dataset. The model, which acts as the subject for our assurance demonstration, produces per-patch embeddings that we decompose into interpretable atoms via K-SVD sparse dictionary learning. A qualitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
