Fishing for Exactness

Ted Pedersen (Southern Methodist University; Dallas; TX)

arXiv:cmp-lg/9608010·cmp-lg·February 3, 2008·97 cites

Fishing for Exactness

Ted Pedersen (Southern Methodist University, Dallas, TX)

PDF

Open Access

TL;DR

This paper advocates using Fisher's exact test over traditional asymptotic tests for identifying dependent word pairs in natural language, demonstrating its superior reliability especially with sparse and skewed data.

Contribution

It introduces Fisher's exact test as a more appropriate statistical method for dependency detection in NLP, supported by theoretical and experimental comparisons.

Findings

01

Fisher's exact test outperforms asymptotic tests in dependency detection.

02

Fisher's test is more reliable with sparse, skewed data.

03

The approach is applicable to various NLP problems with similar data characteristics.

Abstract

Statistical methods for automatically identifying dependent word pairs (i.e. dependent bigrams) in a corpus of natural language text have traditionally been performed using asymptotic tests of significance. This paper suggests that Fisher's exact test is a more appropriate test due to the skewed and sparse data samples typical of this problem. Both theoretical and experimental comparisons between Fisher's exact test and a variety of asymptotic tests (the t-test, Pearson's chi-square test, and Likelihood-ratio chi-square test) are presented. These comparisons show that Fisher's exact test is more reliable in identifying dependent word pairs. The usefulness of Fisher's exact test extends to other problems in statistical natural language processing as skewed and sparse data appears to be the rule in natural language. The experiment presented in this paper was performed using PROC FREQ of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling