Parsing with Pretrained Language Models, Multiple Datasets, and Dataset Embeddings
Rob van der Goot, Miryam de Lhoneux

TL;DR
This paper investigates the effectiveness of dataset embeddings in transformer-based multilingual dependency parsers, demonstrating benefits especially for small or low-performing datasets and comparing different embedding strategies.
Contribution
It compares two methods of embedding datasets in transformer models and provides extensive evaluation, showing dataset embedding remains beneficial in modern NLP models.
Findings
Embedding dataset information improves parser performance.
Encoder-level dataset embedding yields the highest performance gains.
Training on combined datasets is comparable to language-based clustering.
Abstract
With an increase of dataset availability, the potential for learning from a variety of data sources has increased. One particular method to improve learning from multiple data sources is to embed the data source during training. This allows the model to learn generalizable features as well as distinguishing features between datasets. However, these dataset embeddings have mostly been used before contextualized transformer-based embeddings were introduced in the field of Natural Language Processing. In this work, we compare two methods to embed datasets in a transformer-based multilingual dependency parser, and perform an extensive evaluation. We show that: 1) embedding the dataset is still beneficial with these models 2) performance increases are highest when embedding the dataset at the encoder level 3) unsurprisingly, we confirm that performance increases are highest for small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
