SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs
Shengzhi Li, Nima Tajbakhsh

TL;DR
SciGraphQA is a large-scale, synthetic multi-turn question-answering dataset for scientific graphs, enabling better evaluation and fine-tuning of multimodal language models in scientific contexts.
Contribution
The paper introduces SciGraphQA, the largest open-source dataset for scientific graph VQA, generated from 290,000 papers, and evaluates its utility for model assessment and fine-tuning.
Findings
LLaVA-13B outperforms other models on the dataset.
Including serialized data tables improves model performance.
Fine-tuning on SciGraphQA significantly enhances model accuracy.
Abstract
In this work, we present SciGraphQA, a synthetic multi-turn question-answer dataset related to academic graphs. SciGraphQA is 13 times larger than ChartVQA, the previously largest chart-visual question-answering dataset. It is also the largest open-sourced chart VQA dataset with non-synthetic charts. To build our dataset, we selected 290,000 Computer Science or Machine Learning ArXiv papers published between 2010 and 2020, and then used Palm-2 to generate 295K samples of open-vocabulary multi-turn question-answering dialogues about the graphs. As context, we provided the text-only Palm-2 with paper title, abstract, paragraph mentioning the graph, and rich text contextual data from the graph itself, obtaining dialogues with an average 2.23 question-answer turns for each graph. We asked GPT-4 to assess the matching quality of our question-answer turns given the paper's context, obtaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
MethodsAttention Is All You Need · Label Smoothing · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer
