Conservative Likelihood Ratio Estimator for Infrequent Data Slightly above a Frequency Threshold
Masato Kikuchi, Yuhi Kusakabe, Tadachika Ozono

TL;DR
This paper introduces a conservative likelihood ratio estimator for infrequent data that improves prediction accuracy near a frequency threshold while maintaining computational efficiency.
Contribution
It proposes a novel conservative estimator for low-frequency data slightly above a threshold, enhancing prediction accuracy in likelihood ratio estimation.
Findings
Improved context prediction accuracy with the new estimator.
Maintained efficiency by avoiding computation for very low frequencies.
Demonstrated effectiveness on named entity occurrence prediction.
Abstract
A naive likelihood ratio (LR) estimation using the observed frequencies of events can overestimate LRs for infrequent data. One approach to avoid this problem is to use a frequency threshold and set the estimates to zero for frequencies below the threshold. This approach eliminates the computation of some estimates, thereby making practical tasks using LRs more efficient. However, it still overestimates LRs for low frequencies near the threshold. This study proposes a conservative estimator for low frequencies, slightly above the threshold. Our experiment used LRs to predict the occurrence contexts of named entities from a corpus. The experimental results demonstrate that our estimator improves the prediction accuracy while maintaining efficiency in the context prediction task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
