Unsupervised Learning Algorithms for Keyword Extraction in an Undergraduate Thesis
Fred Torres-Cruz, Edelfre Flores, William E. Arcaya, Irenio L. Chagua,, Marga I. Ingaluque

TL;DR
This paper evaluates nine unsupervised machine learning algorithms for keyword extraction from research project records, finding TF-IDF to be the most effective with 72% accuracy and efficient processing times.
Contribution
It compares multiple unsupervised algorithms for keyword extraction in academic research data, highlighting TF-IDF's superior performance.
Findings
TF-IDF achieved 72% accuracy in keyword extraction.
TF-IDF had an average processing time of 0.4786 seconds per record.
Nine unsupervised models were evaluated on 7430 research records.
Abstract
The amount of data managed in many academic institutions has increased in recent years, particularly in all the research work done by undergraduate students, who simply use empirical techniques for keyword selection, forgetting existing technical methods to assist their students in this process. Information and communication technologies, such as the platform for integrated research and academic work with responsibility (PILAR), which records information about research projects, such as titles, summaries, and keywords in their various modalities, have gained relevance and importance in the management of these. We proved algorithms with these records of research projects that have been analysed in this study, and predictions were made for each of the nine (09) models of unsupervised machine learning algorithms that were implemented for each of the 7430 records from the dataset. The most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
