DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization
Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong

TL;DR
DataVisT5 is a specialized pre-trained language model designed to understand and generate both text and data visualizations, improving automation and interpretation in data visualization tasks.
Contribution
It introduces a novel PLM tailored for data visualization, combining hybrid pre-training and multi-task fine-tuning to handle cross-modal data effectively.
Findings
Outperforms state-of-the-art models on multiple DV tasks
Effective integration of text and visualization data
Enhances automation in data visualization applications
Abstract
Data visualization (DV) is the fundamental and premise tool to improve the efficiency in conveying the insights behind the big data, which has been widely accepted in existing data-driven world. Task automation in DV, such as converting natural language queries to visualizations (i.e., text-to-vis), generating explanations from visualizations (i.e., vis-to-text), answering DV-related questions in free form (i.e. FeVisQA), and explicating tabular data (i.e., table-to-text), is vital for advancing the field. Despite their potential, the application of pre-trained language models (PLMs) like T5 and BERT in DV has been limited by high costs and challenges in handling cross-modal information, leading to few studies on PLMs for DV. We introduce DataVisT5, a novel PLM tailored for DV that enhances the T5 architecture through a hybrid objective pre-training and multi-task fine-tuning strategy,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Computational Physics and Python Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Inverse Square Root Schedule · Linear Layer · Attention Dropout · SentencePiece · Dense Connections · Dropout · Residual Connection
