TL;DR
This paper introduces SIMD-optimized algorithms for Unicode text transcoding, significantly increasing speed and making large-scale text format conversions more efficient on modern processors.
Contribution
It presents novel SIMD-based transcoding algorithms that outperform existing methods, with open-source implementation for reproducibility.
Findings
Transcoding speed increased by an order of magnitude
Algorithms work efficiently on x64 and ARM architectures
Open source library available for practical use
Abstract
In software, text is often represented using Unicode formats (UTF-8 and UTF-16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state-of-the-art disks and networks. These transcoding functions make little use of the single-instruction-multiple-data (SIMD) instructions available on commodity processors. By designing transcoding algorithms for SIMD instructions, we multiply the speed of transcoding on current systems (x64 and ARM). To ensure reproducibility, we make our software freely available as an open source library.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
