Determining the Unithood of Word Sequences using Mutual Information and   Independence Measure

Wilson Wong; Wei Liu; Mohammed Bennamoun

arXiv:0810.0156·cs.AI·October 2, 2008·1 cites

Determining the Unithood of Word Sequences using Mutual Information and Independence Measure

Wilson Wong, Wei Liu, Mohammed Bennamoun

PDF

Open Access

TL;DR

This paper introduces a novel, independent method for measuring the unithood of word sequences using mutual information and independence measures, achieving high accuracy in linguistic evidence evaluation.

Contribution

It presents a new unithood measurement approach that does not rely on termhood, combining parsed text analysis with Google search engine data.

Findings

01

Precision of 98.68% in unithood detection

02

Recall of 91.82% in unithood detection

03

Overall accuracy of 95.42%

Abstract

Most works related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, the number of independent research that study the notion of unithood and produce dedicated techniques for measuring unithood is extremely small. We propose a new approach, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our evaluations revealed a precision and recall of 98.68% and 91.82% respectively with an accuracy at 95.42% in measuring the unithood of 1005 test cases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Algorithms and Data Compression