Clustering Words by Projection Entropy

I\c{s}{\i}k Bar{\i}\c{s} Fidaner; Ali Taylan Cemgil

arXiv:1410.6830·cs.CL·October 28, 2014·1 cites

Clustering Words by Projection Entropy

I\c{s}{\i}k Bar{\i}\c{s} Fidaner, Ali Taylan Cemgil

PDF

Open Access

TL;DR

This paper demonstrates how entropy agglomeration (EA), a new clustering algorithm based on projection entropy, effectively groups words in a literary text by analyzing their occurrences across paragraphs.

Contribution

It introduces the application of entropy agglomeration (EA) to text clustering, showcasing its ability to capture meaningful word relationships using a novel entropy measure.

Findings

01

EA successfully clusters related words in the text.

02

The method captures significant semantic relationships.

03

Implementation is available as open-source software.

Abstract

We apply entropy agglomeration (EA), a recently introduced algorithm, to cluster the words of a literary text. EA is a greedy agglomerative procedure that minimizes projection entropy (PE), a function that can quantify the segmentedness of an element set. To apply it, the text is reduced to a feature allocation, a combinatorial object to represent the word occurences in the text's paragraphs. The experiment results demonstrate that EA, despite its reduction and simplicity, is useful in capturing significant relationships among the words in the text. This procedure was implemented in Python and published as a free software: REBUS.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Clustering Algorithms Research · Bayesian Methods and Mixture Models