Ensembling Transformers for Cross-domain Automatic Term Extraction

Hanh Thi Hong Tran; Matej Martinc; Andraz Pelicon; Antoine Doucet; and; Senja Pollak

arXiv:2212.05696·cs.CL·December 13, 2022

Ensembling Transformers for Cross-domain Automatic Term Extraction

Hanh Thi Hong Tran, Matej Martinc, Andraz Pelicon, Antoine Doucet, and, Senja Pollak

PDF

TL;DR

This study evaluates the effectiveness of Transformer-based pretrained language models for automatic term extraction across multiple languages and domains, demonstrating that monolingual models generally outperform multilingual ones, with ensemble methods further enhancing performance.

Contribution

It provides a comprehensive comparison of monolingual and multilingual Transformer models for cross-domain, multi-language term extraction and introduces ensemble strategies that improve extraction accuracy.

Findings

01

Monolingual models outperform multilingual models in most cases.

02

Ensemble of top models yields significant performance improvements.

03

Performance varies depending on language and whether named entity terms are included.

Abstract

Automatic term extraction plays an essential role in domain language understanding and several natural language processing downstream tasks. In this paper, we propose a comparative study on the predictive power of Transformers-based pretrained language models toward term extraction in a multi-language cross-domain setting. Besides evaluating the ability of monolingual models to extract single- and multi-word terms, we also experiment with ensembles of mono- and multilingual models by conducting the intersection or union on the term output sets of different language models. Our experiments have been conducted on the ACTER corpus covering four specialized domains (Corruption, Wind energy, Equitation, and Heart failure) and three languages (English, French, and Dutch), and on the RSDO5 Slovenian corpus covering four additional domains (Biomechanics, Chemistry, Veterinary, and Linguistics).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.