Interpreting Neural Networks to Improve Politeness Comprehension
Malika Aubakirova, Mohit Bansal

TL;DR
This paper introduces an interpretable neural network model for politeness prediction in natural language requests, revealing linguistic markers and strategies that improve understanding and performance over traditional feature-based methods.
Contribution
It presents a simple CNN-based approach that avoids manual feature engineering and provides insights into the linguistic features learned by neural networks for politeness.
Findings
Neural networks outperform feature-based models in politeness prediction.
Visualization techniques reveal subtle linguistic markers of politeness.
Adding identified politeness strategies as features improves model accuracy.
Abstract
We present an interpretable neural network approach to predicting and understanding politeness in natural language requests. Our models are based on simple convolutional neural networks directly on raw text, avoiding any manual identification of complex sentiment or syntactic features, while performing better than such feature-based models from previous work. More importantly, we use the challenging task of politeness prediction as a testbed to next present a much-needed understanding of what these successful networks are actually learning. For this, we present several network visualizations based on activation clusters, first derivative saliency, and embedding space transformations, helping us automatically identify several subtle linguistics markers of politeness theories. Further, this analysis reveals multiple novel, high-scoring politeness strategies which, when added back as new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
