Distributional Clustering of English Words
Fernando Pereira (AT&T Bell Laboratories), Naftali Tishby (Hebrew, University), Lillian Lee (Harvard University)

TL;DR
This paper presents a hierarchical distributional clustering method for English words using deterministic annealing, which improves word class modeling based on syntactic context and is evaluated on test data.
Contribution
It introduces a novel hierarchical clustering approach using deterministic annealing for distributional word clustering, enhancing class-based language models.
Findings
Hierarchical clusters improve word class modeling.
Clusters are stable and meaningful across different annealing stages.
Models evaluated show good performance on held-out data.
Abstract
We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest distortion sets of clusters. As the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical ``soft'' clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Data Mining Algorithms and Applications · Bayesian Methods and Mixture Models
