SpikeMLLM: Spike-based Multimodal Large Language Models via Modality-Specific Temporal Scales and Temporal Compression

Han Xu; Zhiyong Qin; Di Shang; Jiahong Zhang; Xuerui Qiu; Bo Lei; Tiejun Huang; Bo Xu; Guoqi Li

arXiv:2604.18610·cs.NE·April 22, 2026

SpikeMLLM: Spike-based Multimodal Large Language Models via Modality-Specific Temporal Scales and Temporal Compression

Han Xu, Zhiyong Qin, Di Shang, Jiahong Zhang, Xuerui Qiu, Bo Lei, Tiejun Huang, Bo Xu, Guoqi Li

PDF

TL;DR

SpikeMLLM introduces a spike-based framework for multimodal large language models, significantly reducing energy consumption and computation while maintaining performance, enabled by modality-specific temporal scales and compression techniques.

Contribution

It is the first spike-based MLLM framework that unifies ANN quantization in spiking space and employs novel temporal compression methods.

Findings

01

Maintains near-lossless performance with aggressive timestep compression.

02

Achieves 9.06x higher throughput and 25.8x better power efficiency on dedicated hardware.

03

Outperforms FP16 GPU baseline in energy efficiency and throughput.

Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable progress but incur substantial computational overhead and energy consumption during inference, limiting deployment in resource-constrained environments. Spiking Neural Networks (SNNs), with their sparse event-driven computation, offer inherent energy efficiency advantages on neuromorphic hardware, yet extending them to MLLMs faces two key challenges: heterogeneous modalities make uniform spike encoding insufficient, and high-resolution image inputs amplify timestep unfolding overhead. We propose SpikeMLLM, the first spike-based framework for MLLMs, which unifies existing ANN quantization methods in the spiking representation space and incorporates Modality-Specific Temporal Scales (MSTS) guided by Modality Evolution Discrepancy (MED) and Temporally Compressed LIF (TC-LIF) for timestep compression from T=L-1 to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.