Large Language Models Understand Layout
Weiming Li, Manni Duan, Dong An, Yan Shao

TL;DR
This paper demonstrates that large language models can understand and reason about text layouts using spatial markers, with their abilities enhanced through specific training data and instruction tuning, benefiting visual question-answering systems.
Contribution
It reveals that LLMs' layout understanding stems from pretraining data and instruction tuning, and introduces a novel auto-generated data approach to improve this ability.
Findings
LLMs can answer spatial reasoning questions with layout cues
Layout understanding improves with instruction tuning and auto-generated data
Layout understanding enhances visual question-answering performance
Abstract
Large language models (LLMs) demonstrate extraordinary abilities in a wide range of natural language processing (NLP) tasks. In this paper, we show that, beyond text understanding capability, LLMs are capable of processing text layouts that are denoted by spatial markers. They are able to answer questions that require explicit spatial perceiving and reasoning, while a drastic performance drop is observed when the spatial markers from the original data are excluded. We perform a series of experiments with the GPT-3.5, Baichuan2, Llama2 and ChatGLM3 models on various types of layout-sensitive datasets for further analysis. The experimental results reveal that the layout understanding ability of LLMs is mainly introduced by the coding data for pretraining, which is further enhanced at the instruction-tuning stage. In addition, layout understanding can be enhanced by integrating low-cost,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Machine Learning and Algorithms
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Adam · Dropout · Dense Connections · Weight Decay · Multi-Head Attention · Residual Connection
