JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models
Ce Chi, Xing Wang, Zhendong Wang, Xiaofan Liu, Ce Li, Zhiyan Song, Chen Zhao, Kexin Yang, Boshen Shi, Jingjing Yang, Chao Deng, Junlan Feng

TL;DR
This paper introduces JT-DA-8B, a large language model specialized for complex table reasoning, trained on a diverse dataset with a multi-step reasoning workflow, achieving strong performance in real-world scenarios.
Contribution
The work presents a new specialized large language model for table reasoning, with a comprehensive training corpus and a four-stage reasoning workflow to enhance interpretability and accuracy.
Findings
Achieves strong performance on various table reasoning tasks.
Demonstrates effectiveness of data-centric generation and workflow optimization.
Utilizes a large, diverse dataset of 3 million tables for training.
Abstract
In this work, we present JT-DA-8B (JiuTian Data Analyst 8B), a specialized large language model designed for complex table reasoning tasks across diverse real-world scenarios. To address the lack of high-quality supervision in tabular reasoning scenarios, we construct a comprehensive and diverse training corpus with 34 well-defined table reasoning tasks, by aggregating 29 public table QA datasets and 3 million tables. An automatic pipeline is proposed to generate realistic multi-step analytical tasks involving reasoning patterns. The model is trained upon open-source JT-Coder-8B model, an 8B-parameter decoder-only foundation model trained from scratch. In the training stage, we leverage LLM-based scoring and workflow-aligned filtering to distill high-quality, table-centric data. Both supervised fine-tuning (SFT) and Reinforcement learning (RL) are adopted to optimize our model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Advanced Graph Neural Networks
