Probing the Category of Verbal Aspect in Transformer Language Models
Anisia Katinskaia, Roman Yangarber

TL;DR
This study explores how transformer language models encode Russian verbal aspect, revealing that models like BERT and RoBERTa encode aspect in their final layers and are influenced by semantic features like boundedness, with implications for fine-tuning strategies.
Contribution
It is the first to analyze how pretrained transformer models encode verbal aspect in Russian, using behavioral and causal probing methods to understand their internal representations.
Findings
Models encode aspect mostly in final layers.
Counterfactual interventions influence aspect predictions in line with grammar.
Fine-tuning last layers is faster and more effective.
Abstract
We investigate how pretrained language models (PLM) encode the grammatical category of verbal aspect in Russian. Encoding of aspect in transformer LMs has not been studied previously in any language. A particular challenge is posed by "alternative contexts": where either the perfective or the imperfective aspect is suitable grammatically and semantically. We perform probing using BERT and RoBERTa on alternative and non-alternative contexts. First, we assess the models' performance on aspect prediction, via behavioral probing. Next, we examine the models' performance when their contextual representations are substituted with counterfactual representations, via causal probing. These counterfactuals alter the value of the "boundedness" feature--a semantic feature, which characterizes the action in the context. Experiments show that BERT and RoBERTa do encode aspect--mostly in their final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Linear Layer · Adam · Attention Is All You Need · Residual Connection · Multi-Head Attention
