Improving PPM Algorithm Using Dictionaries

Yichuan Hu; Jianzhong (Charlie) Zhang; Farooq Khan; Ying Li

arXiv:1012.3790·cs.IT·March 17, 2015·1 cites

Improving PPM Algorithm Using Dictionaries

Yichuan Hu, Jianzhong (Charlie) Zhang, Farooq Khan, Ying Li

PDF

Open Access

TL;DR

This paper introduces an improved PPM text compression algorithm that integrates dictionary models to encode word suffixes, significantly enhancing compression efficiency without preprocessing or extra computational costs.

Contribution

The proposed method combines context and dictionary models within PPM algorithms, enabling more efficient encoding of word suffixes and improving compression performance.

Findings

01

Significant compression improvements over traditional character-based PPM.

02

Effective in low-order PPM configurations.

03

No additional preprocessing or explicit switch codes needed.

Abstract

We propose a method to improve traditional character-based PPM text compression algorithms. Consider a text file as a sequence of alternating words and non-words, the basic idea of our algorithm is to encode non-words and prefixes of words using character-based context models and encode suffixes of words using dictionary models. By using dictionary models, the algorithm can encode multiple characters as a whole, and thus enhance the compression efficiency. The advantages of the proposed algorithm are: 1) it does not require any text preprocessing; 2) it does not need any explicit codeword to identify switch between context and dictionary models; 3) it can be applied to any character-based PPM algorithms without incurring much additional computational cost. Test results show that significant improvements can be obtained over character-based PPM, especially in low order cases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Advanced Data Compression Techniques