# Analyzing and Interpreting Neural Networks for NLP: A Report on the   First BlackboxNLP Workshop

**Authors:** Afra Alishahi, Grzegorz Chrupa{\l}a, Tal Linzen

arXiv: 1904.04063 · 2019-04-09

## TL;DR

This paper reviews various methods and studies from the EMNLP 2018 BlackboxNLP workshop focused on analyzing, interpreting, and explaining neural network models for natural language processing.

## Contribution

It provides a comprehensive overview of techniques for understanding neural models in NLP, including input manipulation, decoding representations, and architecture modifications.

## Key findings

- Neural models' internal representations can be decoded to reveal linguistic knowledge.
- Manipulating inputs affects model performance, shedding light on model behavior.
- Simplified language tests help evaluate interpretability of neural networks.

## Abstract

The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigating the impact on their performance, testing whether interpretable knowledge can be decoded from intermediate representations acquired by neural networks, proposing modifications to neural network architectures to make their knowledge state or generated output more explainable, and examining the performance of networks on simplified or formal languages. Here we review a number of representative studies in each category.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.04063/full.md

## References

85 references — full list in the complete paper: https://tomesphere.com/paper/1904.04063/full.md

---
Source: https://tomesphere.com/paper/1904.04063