Bilinear Convolution Decomposition for Causal RL Interpretability

Narmeen Oozeer; Sinem Erisken; Alice Rigg

arXiv:2412.00944·cs.LG·December 3, 2024

Bilinear Convolution Decomposition for Causal RL Interpretability

Narmeen Oozeer, Sinem Erisken, Alice Rigg

PDF

Open Access

TL;DR

This paper introduces bilinear convolutional models in reinforcement learning that enhance interpretability through analytic decomposition and causal validation, enabling better understanding of model decisions.

Contribution

It proposes replacing nonlinearities with bilinear variants in ConvNets for RL, enabling analytic decomposition and causal validation of model interpretability.

Findings

01

Bilinear models perform comparably to traditional models in RL tasks.

02

Decomposition techniques reveal interpretable low-rank structures.

03

Methodology allows causal validation of concept-based probes.

Abstract

Efforts to interpret reinforcement learning (RL) models often rely on high-level techniques such as attribution or probing, which provide only correlational insights and coarse causal control. This work proposes replacing nonlinearities in convolutional neural networks (ConvNets) with bilinear variants, to produce a class of models for which these limitations can be addressed. We show bilinear model variants perform comparably in model-free reinforcement learning settings, and give a side by side comparison on ProcGen environments. Bilinear layers' analytic structure enables weight-based decomposition. Previous work has shown bilinearity enables quantifying functional importance through eigendecomposition, to identify interpretable low rank structure. We show how to adapt the decomposition to convolution layers by applying singular value decomposition to vectors of interest, to separate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Fault Detection and Control Systems · Topic Modeling

MethodsConvolution