TL;DR
This paper introduces Pareto hypervolume as a metric to evaluate the trade-off between probe complexity and performance in linguistic probing of contextual word representations, revealing limitations of simple tasks and advocating for more challenging probing methods like dependency parsing.
Contribution
It proposes Pareto hypervolume as a new evaluation metric for probing complexity and performance, and advocates for using full dependency parsing as a more effective probing task.
Findings
Probes often do not align with expectations regarding information encoding.
Simple probing tasks are inadequate for evaluating linguistic structure.
Dependency parsing reveals significant syntactic knowledge gaps.
Abstract
The question of how to probe contextual word representations for linguistic structure in a way that is both principled and useful has seen significant attention recently in the NLP literature. In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume. To measure complexity, we present a number of parametric and non-parametric metrics. Our experiments using Pareto hypervolume as an evaluation metric show that probes often do not conform to our expectations -- e.g., why should the non-contextual fastText representations encode more morpho-syntactic information than the contextual BERT representations? These results suggest that common, simplistic probing tasks, such as part-of-speech labeling and dependency arc labeling, are inadequate to evaluate the linguistic structure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Multi-Head Attention · Dropout · Softmax · Attention Dropout · Residual Connection · Dense Connections · fastText · WordPiece · Layer Normalization
