The Problem with XSD Binary Floating Point Datatypes in RDF
Jan Martin Keil, Merle G\"an{\ss}inger

TL;DR
This paper highlights issues with using XSD binary floating point datatypes in RDF, showing they often distort data and proposing XSD decimal as a more reliable alternative based on real web data analysis.
Contribution
It demonstrates the practical problems of binary floating point datatypes in RDF and advocates for using XSD decimal to improve data quality.
Findings
29%-68% of binary floating point values are distorted in real web data
Using decimal datatypes reduces data distortion and improves accuracy
Empirical analysis supports replacing binary floating point with decimal in RDF
Abstract
The XSD binary floating point datatypes are regularly used for precise numeric values in RDF. However, the use of these datatypes for knowledge representation can systematically impair the quality of data and, compared to the XSD decimal datatype, increases the probability of data processing producing false results. We argue why in most cases the XSD decimal datatype is better suited to represent numeric values in RDF. A survey of the actual usage of datatypes on the relevant subset of the December 2020 Web Data Commons dataset, containing 19453060341 literals from real web data, substantiates the practical relevancy of the described problem: 29 %-68 % of binary floating point values are distorted due to the datatype.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
