Structural analysis of an all-purpose question answering model

Vincent Micheli; Quentin Heinrich; Fran\c{c}ois Fleuret; Wacim; Belblidia

arXiv:2104.06045·cs.CL·April 14, 2021

Structural analysis of an all-purpose question answering model

Vincent Micheli, Quentin Heinrich, Fran\c{c}ois Fleuret, Wacim, Belblidia

PDF

Open Access

TL;DR

This paper analyzes a new all-purpose question answering model, revealing that attention heads specialize in specific tasks and that the model maintains single-task performance without strong transfer effects.

Contribution

It introduces a novel question answering model and provides a structural analysis of its attention mechanisms, highlighting task-specific head specialization.

Findings

01

Attention heads specialize in particular tasks

02

Model retains single-task performance without transfer effects

03

Some attention heads are more conducive to learning

Abstract

Attention is a key component of the now ubiquitous pre-trained language models. By learning to focus on relevant pieces of information, these Transformer-based architectures have proven capable of tackling several tasks at once and sometimes even surpass their single-task counterparts. To better understand this phenomenon, we conduct a structural analysis of a new all-purpose question answering model that we introduce. Surprisingly, this model retains single-task performance even in the absence of a strong transfer effect between tasks. Through attention head importance scoring, we observe that attention heads specialize in a particular task and that some heads are more conducive to learning than others in both the multi-task and single-task settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning