An experimental comparison of label selection methods for hierarchical document clusters
Maria Fernanda Moura, Fabiano Fernandes dos Santos, and Solange, Oliveira Rezende

TL;DR
This paper evaluates sixteen label selection methods for hierarchical document clustering, comparing their effectiveness in capturing relevant information and coherence across five datasets using a comprehensive, standardized methodology.
Contribution
It introduces a standardized notation and evaluation framework for comparing label selection methods in hierarchical clustering, highlighting the effectiveness of methods that consider hierarchical relations.
Findings
Methods ignoring hierarchical relations perform best.
Certain methods effectively capture specific information.
Hierarchical coherence improves label quality.
Abstract
The focus of this paper is on the evaluation of sixteen labeling methods for hierarchical document clusters over five datasets. All of the methods are independent from clustering algorithms, applied subsequently to the dendrogram construction and based on probabilistic dependence relations among labels and clusters. To reach a fair comparison as well as a standard benchmark, we rewrote and presented the labeling methods in a similar notation. The experimental results were analyzed through a proposed evaluation methodology based on: (i) data standardization before applying the cluster labeling methods and over the labeling results; (ii) a particular information retrieval process, using the obtained labels and their hierarchical relations to construct the search queries; (iii) evaluation of the retrieval process through precision, recall and F measure; (iv) variance analysis of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Image Retrieval and Classification Techniques
