On Modeling Profiles instead of Values

Alon Orlitsky; Narayana Santhanam; Krishnamurthy Viswanathan; Junan; Zhang

arXiv:1207.4175·cs.AI·July 19, 2012

On Modeling Profiles instead of Values

Alon Orlitsky, Narayana Santhanam, Krishnamurthy Viswanathan, Junan, Zhang

PDF

Open Access

TL;DR

This paper introduces the high-profile distribution, a new method for estimating data distributions based on profiles rather than values, which better explains data with many symbols.

Contribution

It proposes the high-profile distribution as an alternative to maximum likelihood, analyzes its properties, and compares its effectiveness depending on the number of symbols.

Findings

01

High-profile distribution matches data better when many symbols are observed.

02

When few symbols are observed, high-profile and maximum likelihood are similar.

03

The paper characterizes properties of the high-profile distribution.

Abstract

We consider the problem of estimating the distribution underlying an observed sample of data. Instead of maximum likelihood, which maximizes the probability of the ob served values, we propose a different estimate, the high-profile distribution, which maximizes the probability of the observed profile the number of symbols appearing any given number of times. We determine the high-profile distribution of several data samples, establish some of its general properties, and show that when the number of distinct symbols observed is small compared to the data size, the high-profile and maximum-likelihood distributions are roughly the same, but when the number of symbols is large, the distributions differ, and high-profile better explains the data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Database Systems and Queries · Machine Learning and Algorithms