Automating Date Format Detection for Data Visualization
Zixuan Liang

TL;DR
This paper introduces two algorithms that automatically detect date formats in data columns, significantly improving data preparation efficiency for visualization tasks with high accuracy and speed.
Contribution
The paper presents novel entropy-based and natural language modeling algorithms for automatic date format detection, enhancing data visualization workflows.
Findings
Over 90% accuracy on large data corpus
Minimal entropy method offers fast, interactive feedback
Methods are suitable for integration into visualization tools
Abstract
Data preparation, specifically date parsing, is a significant bottleneck in analytic workflows. To address this, we present two algorithms, one based on minimum entropy and the other on natural language modeling that automatically derive date formats from string data. These algorithms achieve over 90% accuracy on a large corpus of data columns, streamlining the data preparation process within visualization environments. The minimal entropy approach is particularly fast, providing interactive feedback. Our methods simplify date format extraction, making them suitable for integration into data visualization tools and databases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Video Analysis and Summarization
