Measuring Memorization Effect in Word-Level Neural Networks Probing
Rudolf Rosa, Tom\'a\v{s} Musil, David Mare\v{c}ek

TL;DR
This paper introduces a method to quantify memorization in word-level neural network probing, helping distinguish between true linguistic knowledge and simple memorization, thereby improving interpretability of NLP models.
Contribution
It proposes a novel, simple approach to measure memorization effects in probing classifiers, enhancing reliability in interpreting neural network representations.
Findings
The method effectively quantifies memorization in probing tasks.
Application to POS probing in translation models demonstrates its utility.
Results highlight the importance of accounting for memorization in NLP interpretability.
Abstract
Multiple studies have probed representations emerging in neural networks trained for end-to-end NLP tasks and examined what word-level linguistic information may be encoded in the representations. In classical probing, a classifier is trained on the representations to extract the target linguistic information. However, there is a threat of the classifier simply memorizing the linguistic labels for individual words, instead of extracting the linguistic abstractions from the representations, thus reporting false positive results. While considerable efforts have been made to minimize the memorization problem, the task of actually measuring the amount of memorization happening in the classifier has been understudied so far. In our work, we propose a simple general method for measuring the memorization effect, based on a symmetric selection of comparable sets of test words seen versus unseen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
