Domain Specific Hierarchical Huffman Encoding
K.Ilambharathi, G.S.N.V.Venkata Manik, N.Sadagopan, B.Sivaselvan

TL;DR
This paper introduces a two-level domain-specific text compression method combining pattern detection and Huffman encoding, outperforming classical Huffman compression for domain-specific texts.
Contribution
It proposes a novel hierarchical compression framework that integrates pattern-based word-level encoding with Huffman coding, enhancing compression ratios for domain-specific data.
Findings
Outperforms classical Huffman compression on domain-specific texts
Theoretical analysis supports improved efficiency
Simulation results confirm better compression ratios
Abstract
In this paper, we revisit the classical data compression problem for domain specific texts. It is well-known that classical Huffman algorithm is optimal with respect to prefix encoding and the compression is done at character level. Since many data transfer are domain specific, for example, downloading of lecture notes, web-blogs, etc., it is natural to think of data compression in larger dimensions (i.e. word level rather than character level). Our framework employs a two-level compression scheme in which the first level identifies frequent patterns in the text using classical frequent pattern algorithms. The identified patterns are replaced with special strings and to acheive a better compression ratio the length of a special string is ensured to be shorter than the length of the corresponding pattern. After this transformation, on the resultant text, we employ classical Huffman data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Advanced Data Compression Techniques
