Bilinear Convolution Decomposition for Causal RL Interpretability
Narmeen Oozeer, Sinem Erisken, Alice Rigg

TL;DR
This paper introduces bilinear convolutional models in reinforcement learning that enhance interpretability through analytic decomposition and causal validation, enabling better understanding of model decisions.
Contribution
It proposes replacing nonlinearities with bilinear variants in ConvNets for RL, enabling analytic decomposition and causal validation of model interpretability.
Findings
Bilinear models perform comparably to traditional models in RL tasks.
Decomposition techniques reveal interpretable low-rank structures.
Methodology allows causal validation of concept-based probes.
Abstract
Efforts to interpret reinforcement learning (RL) models often rely on high-level techniques such as attribution or probing, which provide only correlational insights and coarse causal control. This work proposes replacing nonlinearities in convolutional neural networks (ConvNets) with bilinear variants, to produce a class of models for which these limitations can be addressed. We show bilinear model variants perform comparably in model-free reinforcement learning settings, and give a side by side comparison on ProcGen environments. Bilinear layers' analytic structure enables weight-based decomposition. Previous work has shown bilinearity enables quantifying functional importance through eigendecomposition, to identify interpretable low rank structure. We show how to adapt the decomposition to convolution layers by applying singular value decomposition to vectors of interest, to separate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Fault Detection and Control Systems · Topic Modeling
MethodsConvolution
