Similarity-based fuzzy clustering scientific articles: potentials and challenges from mathematical and computational perspectives
Vu Thi Huong, Ida Litzel, and Thorsten Koch

TL;DR
This paper explores the potentials and challenges of similarity-based fuzzy clustering for scientific articles, focusing on mathematical foundations and computational strategies to handle large-scale publication databases.
Contribution
It provides new theoretical insights with second-order optimality conditions and proposes GPU-accelerated solution methods for large-scale fuzzy clustering.
Findings
Established second-order optimality conditions.
Developed GPU-based accelerated algorithms.
Addressed challenges of large-scale publication data.
Abstract
Fuzzy clustering, which allows an article to belong to multiple clusters with soft membership degrees, plays a vital role in analyzing publication data. This problem can be formulated as a constrained optimization model, where the goal is to minimize the discrepancy between the similarity observed from data and the similarity derived from a predicted distribution. While this approach benefits from leveraging state-of-the-art optimization algorithms, tailoring them to work with real, massive databases like OpenAlex or Web of Science - containing about 70 million articles and a billion citations - poses significant challenges. We analyze potentials and challenges of the approach from both mathematical and computational perspectives. Among other things, second-order optimality conditions are established, providing new theoretical insights, and practical solution methods are proposed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Complex Network Analysis Techniques · Facility Location and Emergency Management
