Extractive Adversarial Networks: High-Recall Explanations for   Identifying Personal Attacks in Social Media Posts

Samuel Carton; Qiaozhu Mei; Paul Resnick

arXiv:1809.01499·cs.CL·October 23, 2018·1 cites

Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts

Samuel Carton, Qiaozhu Mei, Paul Resnick

PDF

Open Access

TL;DR

This paper presents an adversarial approach to generate high-recall extractive explanations for neural text classifiers, specifically targeting the detection of personal attacks in social media comments, with emphasis on model bias and interpretability.

Contribution

It introduces an adversarial layer to enhance extractive explanations and highlights the importance of setting a semantic default behavior for improved detection of personal attacks.

Findings

01

Adversarial layer improves explanation recall.

02

Explicit bias manipulation enhances attack detection.

03

Validation with human annotations confirms effectiveness.

Abstract

We introduce an adversarial method for producing high-recall explanations of neural text classifier decisions. Building on an existing architecture for extractive explanations via hard attention, we add an adversarial layer which scans the residual of the attention for remaining predictive signal. Motivated by the important domain of detecting personal attacks in social media comments, we additionally demonstrate the importance of manually setting a semantically appropriate `default' behavior for the model by explicitly manipulating its bias term. We develop a validation set of human-annotated personal attacks to evaluate the impact of these changes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)