Skipping Computations in Multimodal LLMs

Mustafa Shukor; Matthieu Cord

arXiv:2410.09454·cs.CV·October 15, 2024

Skipping Computations in Multimodal LLMs

Mustafa Shukor, Matthieu Cord

PDF

Open Access 1 Repo

TL;DR

This paper investigates computational redundancy in Multimodal Large Language Models during inference, proposing methods to skip or parallelize layers, significantly reducing computation costs while maintaining performance.

Contribution

The study introduces techniques to skip and parallelize computations in MLLMs, demonstrating substantial efficiency gains without performance loss.

Findings

01

Significant computation can be avoided during inference, especially for VQA tasks.

02

Skipping during training recovers 97% of original performance even with substantial layer skipping.

03

Training smaller models can achieve performance comparable to larger models.

Abstract

Large Language Models (LLMs) have demonstrated remarkable success in both textual and multimodal domains. However, this success often comes with substantial computational costs, particularly when handling lengthy sequences of multimodal inputs. This has sparked many efforts focusing on enhancing efficiency during training and inference. In this study, we investigate the computation redundancy in Multimodal Large Language Models (MLLMs) during inference. We propose different methods to skip computations, such as skipping entire blocks, FFN or self-attention (SA) layers. Additionally, we explore parallelizing certain layers, such as FFN and SA layers. Our findings validate that (1) significant amount of computations can be avoided at inference time, especially for tasks such as Visual Question Answering (VQA). (2) Skipping computations during training can recover 97% of the original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mshukor/ima-lmms
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques