Improving PPM Algorithm Using Dictionaries
Yichuan Hu, Jianzhong (Charlie) Zhang, Farooq Khan, Ying Li

TL;DR
This paper introduces an improved PPM text compression algorithm that integrates dictionary models to encode word suffixes, significantly enhancing compression efficiency without preprocessing or extra computational costs.
Contribution
The proposed method combines context and dictionary models within PPM algorithms, enabling more efficient encoding of word suffixes and improving compression performance.
Findings
Significant compression improvements over traditional character-based PPM.
Effective in low-order PPM configurations.
No additional preprocessing or explicit switch codes needed.
Abstract
We propose a method to improve traditional character-based PPM text compression algorithms. Consider a text file as a sequence of alternating words and non-words, the basic idea of our algorithm is to encode non-words and prefixes of words using character-based context models and encode suffixes of words using dictionary models. By using dictionary models, the algorithm can encode multiple characters as a whole, and thus enhance the compression efficiency. The advantages of the proposed algorithm are: 1) it does not require any text preprocessing; 2) it does not need any explicit codeword to identify switch between context and dictionary models; 3) it can be applied to any character-based PPM algorithms without incurring much additional computational cost. Test results show that significant improvements can be obtained over character-based PPM, especially in low order cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Advanced Data Compression Techniques
