Data Transformation Strategies to Remove Heterogeneity
Sangbong Yoo, Jaeyoung Lee, Chanyoung Yoon, Geonyeong Son, Hyein Hong, Seongbum Seo, Soobin Yim, Chanyoung Jung, Jungsoo Park, Misuk Kim, and Yun Jang

TL;DR
This paper reviews data transformation strategies to address heterogeneity caused by data format differences, emphasizing their importance in AI data preparation and highlighting challenges in selecting suitable techniques.
Contribution
It systematically categorizes and analyzes contemporary data transformation approaches for handling data heterogeneity in AI applications.
Findings
Identifies key sources of data heterogeneity.
Classifies transformation strategies based on their approaches.
Highlights challenges and considerations in strategy selection.
Abstract
Data heterogeneity is a prevalent issue, stemming from various conflicting factors, making its utilization complex. This uncertainty, particularly resulting from disparities in data formats, frequently necessitates the involvement of experts to find resolutions. Current methodologies primarily address conflicts related to data structures and schemas, often overlooking the pivotal role played by data transformation. As the utilization of artificial intelligence (AI) continues to expand, there is a growing demand for a more streamlined data preparation process, and data transformation becomes paramount. It customizes training data to enhance AI learning efficiency and adapts input formats to suit diverse AI models. Selecting an appropriate transformation technique is paramount in preserving crucial data details. Despite the widespread integration of AI across various industries,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data Technologies and Applications
