A causal framework for explaining the predictions of black-box   sequence-to-sequence models

David Alvarez-Melis; Tommi S. Jaakkola

arXiv:1707.01943·cs.LG·November 16, 2017

A causal framework for explaining the predictions of black-box sequence-to-sequence models

David Alvarez-Melis, Tommi S. Jaakkola

PDF

TL;DR

This paper introduces a causal framework to interpret black-box sequence-to-sequence models by identifying causally related input-output token groups through perturbation-based analysis, applicable across NLP tasks.

Contribution

It presents a novel causal explanation method for black-box models, leveraging perturbations and graph partitioning to identify relevant token dependencies in sequence-to-sequence predictions.

Findings

01

Effective in explaining model predictions across NLP tasks

02

Identifies causally related token groups accurately

03

Applicable to various structured input-output models

Abstract

We interpret the predictions of any black-box structured input-structured output model around a specific input-output pair. Our method returns an "explanation" consisting of groups of input-output tokens that are causally related. These dependencies are inferred by querying the black-box model with perturbed inputs, generating a graph over tokens from the responses, and solving a partitioning problem to select the most relevant components. We focus the general approach on sequence-to-sequence problems, adopting a variational autoencoder to yield meaningful input perturbations. We test our method across several NLP sequence generation tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.