DePlot: One-shot visual language reasoning by plot-to-table translation

Fangyu Liu; Julian Martin Eisenschlos; Francesco Piccinno; Syrine; Krichene; Chenxi Pang; Kenton Lee; Mandar Joshi; Wenhu Chen; Nigel Collier,; Yasemin Altun

arXiv:2212.10505·cs.CL·May 25, 2023·6 cites

DePlot: One-shot visual language reasoning by plot-to-table translation

Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine, Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier,, Yasemin Altun

PDF

Open Access 1 Repo 4 Models

TL;DR

DePlot introduces a one-shot visual language reasoning method that translates plots into tables for reasoning with large language models, significantly reducing data requirements and improving performance on chart question answering.

Contribution

This work presents the first one-shot approach to visual language reasoning by converting plots to tables for LLM reasoning, with standardized tasks and metrics.

Findings

01

DePlot+LLM outperforms finetuned SOTA models by 24% on human-written chart QA queries.

02

DePlot enables off-the-shelf reasoning without extensive fine-tuning.

03

The method reduces data needs for visual language reasoning tasks.

Abstract

Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huggingface/transformers
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Video Analysis and Summarization