Clustering based approach extracting collocations

Mohamed Achraf Ben Mohamed; Mounir Zrigui; Mohsen Maraoui

arXiv:1207.2714·cs.CL·July 12, 2012·2 cites

Clustering based approach extracting collocations

Mohamed Achraf Ben Mohamed, Mounir Zrigui, Mohsen Maraoui

PDF

Open Access

TL;DR

This paper introduces a clustering-based method for extracting collocations from text corpora, utilizing classical measures to group bigrams and efficiently identify likely collocations by reducing the search space.

Contribution

It proposes a novel clustering approach that combines multiple classical measures to improve collocation extraction efficiency and accuracy.

Findings

01

Effective reduction of search space for collocation extraction

02

Improved accuracy in identifying true collocations

03

Demonstrated applicability on various corpora

Abstract

The following study presents a collocation extraction approach based on clustering technique. This study uses a combination of several classical measures which cover all aspects of a given corpus then it suggests separating bigrams found in the corpus in several disjoint groups according to the probability of presence of collocations. This will allow excluding groups where the presence of collocations is very unlikely and thus reducing in a meaningful way the search space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Second Language Acquisition and Learning