Accurate and Efficient Statistical Testing for Word Semantic Breadth
Yo Ehara

TL;DR
This paper introduces a Householder-aligned permutation test for accurately measuring word semantic breadth, reducing false positives and improving computational efficiency in contextualized embedding analysis.
Contribution
It proposes a novel alignment-based permutation testing method that isolates dispersion differences from directional effects in word meaning analysis.
Findings
Reduced Type-I error by 32.5% with the new method.
Achieved 23x speedup over CPU baseline.
Improved accuracy in detecting genuine semantic breadth differences.
Abstract
Measuring the breadth of a word's meaning, or its spread across contexts, has become feasible with contextualized token embeddings. A word type can be represented as a cloud of token vectors, with dispersion-based statistics serving as proxies for contextual diversity (Nagata and Tanaka-Ishii, ACL2025). These measurements are useful for deciding appropriate sense distinctions when constructing thesauri and domain-specific dictionaries. However, when comparing the breadth of two word types, naive hypothesis testing on dispersion can be misleading: differences in semantic direction can masquerade as dispersion differences, inflating Type-I error and yielding "statistically significant" outcomes even when there is no true breadth difference. This is problematic because significance testing should distinguish genuine effects from incidental fluctuations in small-difference regimes. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
