Treatment of Unicode canoncal decomposition among operating systems

Efstratios Rappos

arXiv:1711.10481·cs.OH·November 30, 2017

Treatment of Unicode canoncal decomposition among operating systems

Efstratios Rappos

PDF

Open Access

TL;DR

This paper examines how different operating systems handle Unicode normalization, highlighting inconsistencies in treating characters with multiple representations, which impacts interoperability.

Contribution

It provides an analysis of Unicode normalization handling across popular operating systems, revealing inconsistencies affecting software interoperability.

Findings

01

Different OS treat Unicode normalization differently

02

Inconsistencies cause interoperability challenges

03

Highlights need for standardization in OS handling

Abstract

This article shows how the text characters that have multiple representations under the Unicode standard are treated by popular operating systems. Whilst most characters have a unique representation in Unicode, some characters such as the accented European letters, can have multiple representations due to a feature of Unicode called normalization. These characters are treated differently by popular operating systems, leading to additional challenges during interoperability of computer programs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computational Techniques and Applications · Embedded Systems and FPGA Design