Unsupervised Technical Domain Terms Extraction using Term Extractor

Suman Dowlagar; Radhika Mamidi

arXiv:2101.09015·cs.CL·January 25, 2021·1 cites

Unsupervised Technical Domain Terms Extraction using Term Extractor

Suman Dowlagar, Radhika Mamidi

PDF

Open Access 1 Repo

TL;DR

This paper presents an unsupervised method for extracting domain-specific terms from text corpora, utilizing chunking, preprocessing, and ranking based on relevance and cohesion, aimed at improving automatic terminology extraction.

Contribution

It introduces a novel unsupervised approach that combines chunking, preprocessing, and ranking functions for domain term extraction, tailored for the TermTraction shared task.

Findings

01

Effective in extracting relevant domain terms

02

Outperforms baseline methods in shared task

03

Demonstrates robustness across different domains

Abstract

Terminology extraction, also known as term extraction, is a subtask of information extraction. The goal of terminology extraction is to extract relevant words or phrases from a given corpus automatically. This paper focuses on the unsupervised automated domain term extraction method that considers chunking, preprocessing, and ranking domain-specific terms using relevance and cohesion functions for ICON 2020 shared task 2: TermTraction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kevinlu1248/pyate
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Natural Language Processing Techniques · Semantic Web and Ontologies