Analyzing Text Representations under Tight Annotation Budgets: Measuring   Structural Alignment

C\'esar Gonz\'alez-Guti\'errez; Audi Primadhanty; Francesco Cazzaro,; Ariadna Quattoni

arXiv:2210.05721·cs.CL·October 13, 2022

Analyzing Text Representations under Tight Annotation Budgets: Measuring Structural Alignment

C\'esar Gonz\'alez-Guti\'errez, Audi Primadhanty, Francesco Cazzaro,, Ariadna Quattoni

PDF

Open Access

TL;DR

This paper introduces a metric to evaluate how well text representations align structurally with tasks, demonstrating that better alignment leads to improved learning from limited annotated data.

Contribution

The paper proposes a novel metric for measuring structural alignment of text representations and empirically shows its importance in low-annotation regimes.

Findings

01

Aligned representations improve few-sample learning

02

Efficient representations induce better input-class structure alignment

03

The metric correlates with model performance under tight annotation budgets

Abstract

Annotating large collections of textual data can be time consuming and expensive. That is why the ability to train models with limited annotation budgets is of great importance. In this context, it has been shown that under tight annotation budgets the choice of data representation is key. The goal of this paper is to better understand why this is so. With this goal in mind, we propose a metric that measures the extent to which a given representation is structurally aligned with a task. We conduct experiments on several text classification datasets testing a variety of models and representations. Using our proposed metric we show that an efficient representation for a task (i.e. one that enables learning from few samples) is a representation that induces a good alignment between latent input structure and class structure.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Topic Modeling