Extracting domain-specific terms using contextual word embeddings
Andra\v{z} Repar, Nada Lavra\v{c}, Senja Pollak

TL;DR
This paper introduces a machine learning method that combines traditional features with contextual word embeddings to improve domain-specific term extraction, demonstrating significant performance gains across multiple fields.
Contribution
The study presents a novel approach integrating contextual embeddings with traditional features for better terminology extraction in Slovenian, outperforming existing methods.
Findings
Significant F1 score improvements over previous state-of-the-art methods.
Effective use of contextual embeddings enhances term extraction accuracy.
Validated across four diverse domains with strong results.
Abstract
Automated terminology extraction refers to the task of extracting meaningful terms from domain-specific texts. This paper proposes a novel machine learning approach to terminology extraction, which combines features from traditional term extraction systems with novel contextual features derived from contextual word embeddings. Instead of using a predefined list of part-of-speech patterns, we first analyse a new term-annotated corpus RSDO5 for the Slovenian language and devise a set of rules for term candidate selection and then generate statistical, linguistic and context-based features. We use a support-vector machine algorithm to train a classification model, evaluate it on the four domains (biomechanics, linguistics, chemistry, veterinary) of the RSDO5 corpus and compare the results with state-of-art term extraction approaches for the Slovenian language. Our approach provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
MethodsSparse Evolutionary Training
