Analyzing Text Representations under Tight Annotation Budgets: Measuring Structural Alignment
C\'esar Gonz\'alez-Guti\'errez, Audi Primadhanty, Francesco Cazzaro,, Ariadna Quattoni

TL;DR
This paper introduces a metric to evaluate how well text representations align structurally with tasks, demonstrating that better alignment leads to improved learning from limited annotated data.
Contribution
The paper proposes a novel metric for measuring structural alignment of text representations and empirically shows its importance in low-annotation regimes.
Findings
Aligned representations improve few-sample learning
Efficient representations induce better input-class structure alignment
The metric correlates with model performance under tight annotation budgets
Abstract
Annotating large collections of textual data can be time consuming and expensive. That is why the ability to train models with limited annotation budgets is of great importance. In this context, it has been shown that under tight annotation budgets the choice of data representation is key. The goal of this paper is to better understand why this is so. With this goal in mind, we propose a metric that measures the extent to which a given representation is structurally aligned with a task. We conduct experiments on several text classification datasets testing a variety of models and representations. Using our proposed metric we show that an efficient representation for a task (i.e. one that enables learning from few samples) is a representation that induces a good alignment between latent input structure and class structure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Topic Modeling
