An Analysis of XML Compression Efficiency
Christopher James Augeri, Barry E. Mullins, Leemon C. Baird III,, Dursun A. Bulutoglu, Rusty O. Baldwin

TL;DR
This paper evaluates various XML compression methods using a new corpus and a combined efficiency metric, revealing that general-purpose compressors often outperform specialized ones depending on the context.
Contribution
It introduces a comprehensive XML test corpus and a combined efficiency metric, providing a systematic comparison of 14 compressors and insights into selection criteria.
Findings
XMill and WBXML are useful in specific cases
General-purpose compressors often outperform specialized ones
Key factors influence compressor selection
Abstract
XML simplifies data exchange among heterogeneous computers, but it is notoriously verbose and has spawned the development of many XML-specific compressors and binary formats. We present an XML test corpus and a combined efficiency metric integrating compression ratio and execution speed. We use this corpus and linear regression to assess 14 general-purpose and XML-specific compressors relative to the proposed metric. We also identify key factors when selecting a compressor. Our results show XMill or WBXML may be useful in some instances, but a general-purpose compressor is often the best choice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
