Clustering based approach extracting collocations
Mohamed Achraf Ben Mohamed, Mounir Zrigui, Mohsen Maraoui

TL;DR
This paper introduces a clustering-based method for extracting collocations from text corpora, utilizing classical measures to group bigrams and efficiently identify likely collocations by reducing the search space.
Contribution
It proposes a novel clustering approach that combines multiple classical measures to improve collocation extraction efficiency and accuracy.
Findings
Effective reduction of search space for collocation extraction
Improved accuracy in identifying true collocations
Demonstrated applicability on various corpora
Abstract
The following study presents a collocation extraction approach based on clustering technique. This study uses a combination of several classical measures which cover all aspects of a given corpus then it suggests separating bigrams found in the corpus in several disjoint groups according to the probability of presence of collocations. This will allow excluding groups where the presence of collocations is very unlikely and thus reducing in a meaningful way the search space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Second Language Acquisition and Learning
