Determining the Unithood of Word Sequences using Mutual Information and Independence Measure
Wilson Wong, Wei Liu, Mohammed Bennamoun

TL;DR
This paper introduces a novel, independent method for measuring the unithood of word sequences using mutual information and independence measures, achieving high accuracy in linguistic evidence evaluation.
Contribution
It presents a new unithood measurement approach that does not rely on termhood, combining parsed text analysis with Google search engine data.
Findings
Precision of 98.68% in unithood detection
Recall of 91.82% in unithood detection
Overall accuracy of 95.42%
Abstract
Most works related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, the number of independent research that study the notion of unithood and produce dedicated techniques for measuring unithood is extremely small. We propose a new approach, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our evaluations revealed a precision and recall of 98.68% and 91.82% respectively with an accuracy at 95.42% in measuring the unithood of 1005 test cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Algorithms and Data Compression
