Image First or Text First? Optimising the Sequencing of Modalities in   Large Language Model Prompting and Reasoning Tasks

Grant Wardle; Teo Susnjak

arXiv:2410.03062·cs.AI·October 7, 2024

Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks

Grant Wardle, Teo Susnjak

PDF

Open Access

TL;DR

This study investigates how the order of images and text in multi-modal prompts affects large language models' reasoning accuracy, revealing that sequencing impacts performance more in simple tasks than complex ones, and emphasizing prompt structure importance.

Contribution

It provides empirical evidence on the influence of modality sequencing in multi-modal prompts and highlights the significance of prompt structure in complex reasoning tasks.

Findings

01

Modality order significantly affects simple task accuracy.

02

Sequencing impact diminishes in complex, multi-image reasoning tasks.

03

Prompt structure and logical flow are crucial for multi-modal reasoning.

Abstract

This paper examines how the sequencing of images and text within multi-modal prompts influences the reasoning performance of large language models (LLMs). We performed empirical evaluations using three commercial LLMs. Our results demonstrate that the order in which modalities are presented can significantly affect performance, particularly in tasks of varying complexity. For simpler tasks involving a single image, modality sequencing had a clear impact on accuracy. However, in more complex tasks involving multiple images and intricate reasoning steps, the effect of sequencing diminished, likely due to the increased cognitive demands of the task. Our findings also highlight the importance of question/prompt structure. In nested and multi-step reasoning tasks, modality sequencing played a key role in shaping model performance. While LLMs excelled in the initial stages of reasoning, they…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems