A Comparative Study of Delta Parquet, Iceberg, and Hudi for Automotive Data Engineering Use Cases
Dinesh Eswararaj, Ajay Babu Nellipudi, Vandana Kollati

TL;DR
This paper compares Delta Parquet, Iceberg, and Hudi data lakehouse formats in automotive data engineering, highlighting their strengths, tradeoffs, and suitability for various use cases like fleet management and predictive maintenance.
Contribution
It provides a comprehensive empirical analysis of these formats using real automotive telemetry data, offering practical insights for format selection and integration in automotive data pipelines.
Findings
Delta Parquet excels in ML readiness and governance.
Iceberg offers high performance for batch analytics.
Hudi is optimized for real-time ingestion and incremental processing.
Abstract
The automotive industry generates vast amounts of data from sensors, telemetry, diagnostics, and real-time operations. Efficient data engineering is critical to handle challenges of latency, scalability, and consistency. Modern data lakehouse formats Delta Parquet, Apache Iceberg, and Apache Hudi offer features such as ACID transactions, schema enforcement, and real-time ingestion, combining the strengths of data lakes and warehouses to support complex use cases. This study presents a comparative analysis of Delta Parquet, Iceberg, and Hudi using real-world time-series automotive telemetry data with fields such as vehicle ID, timestamp, location, and event metrics. The evaluation considers modeling strategies, partitioning, CDC support, query performance, scalability, data consistency, and ecosystem maturity. Key findings show Delta Parquet provides strong ML readiness and governance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
