Using citation networks to evaluate the impact of text length on the identification of relevant concepts
Jorge A. V. Tohalino, Thiago C. Silva, Diego R. Amancio

TL;DR
This study investigates how text length influences keyword extraction accuracy by comparing citation network-based methods with text clustering approaches, revealing that text source significantly impacts performance.
Contribution
It introduces a network-based evaluation of keyword extraction from abstracts versus full papers, highlighting the importance of text source in concept identification.
Findings
Citation-based methods yield similar accuracy levels
Text clustering outperforms citation-based approaches
Different sources lead to significant performance differences
Abstract
The identification of the most significant concepts in unstructured data is of critical importance in various practical applications. Despite the large number of methods that have been put forth to extract the main topics of texts, a limited number of studies have probed the impact of the text length on the performance of keyword extraction (KE) methods. In this study, we adopted a network-based approach to evaluate whether keywords extracted from paper abstracts are compatible with keywords extracted from full papers. We employed a community detection method to identify groups of related papers in citation networks. These paper clusters were then employed to extract keywords from abstracts. Our results indicate that while the various community detection methods employed in our KE approach yielded similar levels of accuracy, a correlation analysis revealed that these methods produced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Biomedical Text Mining and Ontologies · Information Retrieval and Search Behavior
