A Study on How Attention Scores in the BERT Model are Aware of Lexical Categories in Syntactic and Semantic Tasks on the GLUE Benchmark
Dongjun Jang, Sungjoo Byun, Hyopil Shin

TL;DR
This paper investigates how BERT's attention scores reflect lexical categories during fine-tuning on GLUE tasks, revealing task-dependent and task-agnostic lexical biases in attention patterns.
Contribution
It provides empirical evidence that BERT's attention scores are influenced by lexical categories and identifies layers with consistent lexical biases across tasks.
Findings
Attention scores vary with lexical categories during fine-tuning.
Content words are emphasized in semantic tasks, function words in syntactic tasks.
Certain BERT layers show task-independent lexical biases.
Abstract
This study examines whether the attention scores between tokens in the BERT model significantly vary based on lexical categories during the fine-tuning process for downstream tasks. Drawing inspiration from the notion that in human language processing, syntactic and semantic information is parsed differently, we categorize tokens in sentences according to their lexical categories and focus on changes in attention scores among these categories. Our hypothesis posits that in downstream tasks that prioritize semantic information, attention scores centered on content words are enhanced, while in cases emphasizing syntactic information, attention scores centered on function words are intensified. Through experimentation conducted on six tasks from the GLUE benchmark dataset, we substantiate our hypothesis regarding the fine-tuning process. Furthermore, our additional investigations reveal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Multi-Head Attention · Softmax · Dropout · Dense Connections · Adam · Focus
