Domain Specific Hierarchical Huffman Encoding

K.Ilambharathi; G.S.N.V.Venkata Manik; N.Sadagopan; B.Sivaselvan

arXiv:1307.0920·cs.IT·July 4, 2013·5 cites

Domain Specific Hierarchical Huffman Encoding

K.Ilambharathi, G.S.N.V.Venkata Manik, N.Sadagopan, B.Sivaselvan

PDF

Open Access

TL;DR

This paper introduces a two-level domain-specific text compression method combining pattern detection and Huffman encoding, outperforming classical Huffman compression for domain-specific texts.

Contribution

It proposes a novel hierarchical compression framework that integrates pattern-based word-level encoding with Huffman coding, enhancing compression ratios for domain-specific data.

Findings

01

Outperforms classical Huffman compression on domain-specific texts

02

Theoretical analysis supports improved efficiency

03

Simulation results confirm better compression ratios

Abstract

In this paper, we revisit the classical data compression problem for domain specific texts. It is well-known that classical Huffman algorithm is optimal with respect to prefix encoding and the compression is done at character level. Since many data transfer are domain specific, for example, downloading of lecture notes, web-blogs, etc., it is natural to think of data compression in larger dimensions (i.e. word level rather than character level). Our framework employs a two-level compression scheme in which the first level identifies frequent patterns in the text using classical frequent pattern algorithms. The identified patterns are replaced with special strings and to acheive a better compression ratio the length of a special string is ensured to be shorter than the length of the corresponding pattern. After this transformation, on the resultant text, we employ classical Huffman data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Advanced Data Compression Techniques