Modality Inflation: Energy Characterization and Optimization Opportunities for MLLM Inference
Mona Moghadampanah, Adib Rezaei Shahmirzadi, Farhana Amin, Dimitrios S. Nikolopoulos

TL;DR
This paper analyzes the energy costs of multimodal large language models, revealing significant inefficiencies and proposing stage-wise DVFS as an effective optimization to improve energy efficiency during inference.
Contribution
It provides the first detailed stage-level energy analysis of MLLMs and demonstrates practical optimization strategies like DVFS to reduce energy consumption.
Findings
Energy overheads range from 17% to 94% across models.
Energy bottlenecks vary between vision encoders and token sequences.
Stage-wise DVFS reduces energy use with minimal performance loss.
Abstract
Multimodal large language models (MLLMs) are built on text-only LLMs by incorporating additional modalities, enabling multimodal understanding and a broader range of applications. However, these additions introduce a previously unexplored energy trade-off across modalities that remains poorly understood, as most prior work focuses on text-only models. In this paper, we examine modality inflation, a key source of inefficiency in which multimodal inputs increase inference workloads through extra encoding stages and expanded token sequences. We provide the first detailed, stage-level analysis of energy consumption in MLLM inference by breaking the pipeline into vision encoding, prefill, and decoding stages. Using four representative MLLMs evaluated on NVIDIA A100 GPU, we quantify the additional energy required for multimodal inference compared to text-only baselines, observing overheads…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling
