Structural analysis of an all-purpose question answering model
Vincent Micheli, Quentin Heinrich, Fran\c{c}ois Fleuret, Wacim, Belblidia

TL;DR
This paper analyzes a new all-purpose question answering model, revealing that attention heads specialize in specific tasks and that the model maintains single-task performance without strong transfer effects.
Contribution
It introduces a novel question answering model and provides a structural analysis of its attention mechanisms, highlighting task-specific head specialization.
Findings
Attention heads specialize in particular tasks
Model retains single-task performance without transfer effects
Some attention heads are more conducive to learning
Abstract
Attention is a key component of the now ubiquitous pre-trained language models. By learning to focus on relevant pieces of information, these Transformer-based architectures have proven capable of tackling several tasks at once and sometimes even surpass their single-task counterparts. To better understand this phenomenon, we conduct a structural analysis of a new all-purpose question answering model that we introduce. Surprisingly, this model retains single-task performance even in the absence of a strong transfer effect between tasks. Through attention head importance scoring, we observe that attention heads specialize in a particular task and that some heads are more conducive to learning than others in both the multi-task and single-task settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
