T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables
Jie Zhang, Changzai Pan, Kaiwen Wei, Sishi Xiong, Yu Zhao, Xiangyu Li, Jiaxin Peng, Xiaoyan Gu, Jian Yang, Wenhan Chang, Zhenhe Wu, Jiang Zhong, Shuangyong Song, Yongxiang Li, Xuelong Li

TL;DR
This paper introduces T2R-bench, a comprehensive bilingual benchmark for evaluating large language models' ability to generate detailed reports from real-world industrial tables, highlighting current limitations.
Contribution
It presents a new benchmark with real industrial data and an evaluation framework to assess LLMs' performance in table-to-report generation tasks.
Findings
State-of-the-art LLMs score only 62.71, showing room for improvement.
The benchmark covers 19 industry domains and 4 table types.
Existing models struggle with complex, diverse industrial tables.
Abstract
Extensive research has been conducted to explore the capabilities of large language models (LLMs) in table reasoning. However, the essential task of transforming tables information into reports remains a significant challenge for industrial applications. This task is plagued by two critical issues: 1) the complexity and diversity of tables lead to suboptimal reasoning outcomes; and 2) existing table benchmarks lack the capacity to adequately assess the practical application of this task. To fill this gap, we propose the table-to-report task and construct a bilingual benchmark named T2R-bench, where the key information flow from the tables to the reports for this task. The benchmark comprises 457 industrial tables, all derived from real-world scenarios and encompassing 19 industry domains as well as 4 types of industrial tables. Furthermore, we propose an evaluation criteria to fairly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
