Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers
Luis Espinosa-Anke, Alexander Shvets, Alireza Mohammadshahi and, James Henderson, Leo Wanner

TL;DR
This paper introduces a BERT-based sequence tagging model with a graph-aware transformer architecture for recognizing and categorizing lexical collocations across multiple languages, emphasizing the importance of syntactic dependency encoding.
Contribution
It presents a novel graph-aware transformer-enhanced BERT model specifically designed for multilingual lexical collocation recognition, improving understanding of collocation typification in different languages.
Findings
Explicit syntactic dependency encoding improves model performance.
The model effectively recognizes collocations in English, Spanish, and French.
Insights into language-specific collocation patterns were obtained.
Abstract
Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP. However, it is a challenging task due to the varying degrees of frozenness lexical collocations exhibit. In this paper, we put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
