On the letter frequencies and entropy of written Marathi
Jaydeep Chipalkatti, Mihir Kulkarni

TL;DR
This paper analyzes letter frequencies in modern Marathi to identify dominant letters and estimate the language's entropy, providing insights into its statistical structure.
Contribution
It offers the first comprehensive statistical analysis of Marathi letter frequencies and entropy estimation based on large text samples.
Findings
Identified statistically predominant letters in Marathi.
Estimated the entropy of Marathi based on frequency data.
Provided a foundation for linguistic and computational applications.
Abstract
We carry out a comprehensive analysis of letter frequencies in contemporary written Marathi. We determine sets of letters which statistically predominate any large generic Marathi text, and use these sets to estimate the entropy of Marathi.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Algorithms and Data Compression · Authorship Attribution and Profiling
