Improving Term Extraction with Terminological Resources
Sophie Aubin (LIPN), Thierry Hamon (LIPN)

TL;DR
This paper introduces a method that leverages external terminologies to enhance term extraction accuracy in specialized domains, addressing limitations of existing extractors in highly technical texts.
Contribution
It presents a novel approach integrating terminological resources into the extraction process, improving the quantity and reliability of extracted terms in biomedical texts.
Findings
Increased number of term candidates extracted
Higher reliability of term extraction results
Effective integration of terminologies at multiple processing steps
Abstract
Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. The difficulty or impossibility of customising them to new domains is an additional limitation. In this paper, we propose to use external terminologies to influence generic linguistic data in order to augment the quality of the extraction. The tool we implemented exploits testified terms at different steps of the process: chunking, parsing and extraction of term candidates. Experiments reported here show that, using this method, more term candidates can be acquired with a higher level of reliability. We further describe the extraction process involving endogenous disambiguation implemented in the term extractor YaTeA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · linguistics and terminology studies · Natural Language Processing Techniques
