An Algorithm to Self-Extract Secondary Keywords and Their Combinations   Based on Abstracts Collected using Primary Keywords from Online Digital   Libraries

Natarajan Meghanathan; Nataliya Kostyuk; Raphael Isokpehi; Hari Cohly

arXiv:1006.1184·cs.IR·July 15, 2010

An Algorithm to Self-Extract Secondary Keywords and Their Combinations Based on Abstracts Collected using Primary Keywords from Online Digital Libraries

Natarajan Meghanathan, Nataliya Kostyuk, Raphael Isokpehi, Hari Cohly

PDF

TL;DR

This paper presents an algorithm that automatically extracts secondary keywords and their combinations from abstracts collected via primary keywords, reducing user input over time and enabling efficient keyword analysis across large datasets.

Contribution

The paper introduces a novel algorithm that self-extracts secondary keywords and their combinations from abstracts, minimizing user intervention as dataset size increases.

Findings

01

User queries decrease as dataset size grows.

02

Effective extraction of secondary keywords and combinations.

03

Applicable to large digital library collections.

Abstract

The high-level contribution of this paper is the development and implementation of an algorithm to selfextract secondary keywords and their combinations (combo words) based on abstracts collected using standard primary keywords for research areas from reputed online digital libraries like IEEE Explore, PubMed Central and etc. Given a collection of N abstracts, we arbitrarily select M abstracts (M<< N; M/N as low as 0.15) and parse each of the M abstracts, word by word. Upon the first-time appearance of a word, we query the user for classifying the word into an Accept-List or non-Accept-List. The effectiveness of the training approach is evaluated by measuring the percentage of words for which the user is queried for classification when the algorithm parses through the words of each of the M abstracts. We observed that as M grows larger, the percentage of words for which the user is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.