CHIME: Chiplet-based Heterogeneous Near-Memory Acceleration for Edge Multimodal LLM Inference
Yanru Chen, Runyang Tian, Yue Pan, Zheyu Li, Weihong Xu, Tajana Rosing

TL;DR
CHIME introduces a chiplet-based heterogeneous near-memory architecture combining DRAM and RRAM to accelerate edge multimodal LLM inference, significantly improving speed and energy efficiency over existing solutions.
Contribution
It presents a novel heterogeneous hardware design and co-designed mapping framework that reduces data movement and enhances performance for edge MLLMs inference.
Findings
Achieves up to 54x speedup over NVIDIA Jetson Orin NX.
Provides up to 246x better energy efficiency per inference.
Delivers up to 69.2x higher throughput than state-of-the-art PIM accelerators.
Abstract
The proliferation of large language models (LLMs) is accelerating the integration of multimodal assistants into edge devices, where inference is executed under stringent latency and energy constraints, often exacerbated by intermittent connectivity. These challenges become particularly acute in the context of multimodal LLMs (MLLMs), as high-dimensional visual inputs are transformed into extensive token sequences, thereby inflating the key-value (KV) cache and imposing substantial data movement overheads to the LLM backbone. To address these issues, we present CHIME, a chiplet-based heterogeneous near-memory acceleration for edge MLLMs inference. CHIME leverages the complementary strengths of integrated monolithic 3D (M3D) DRAM and RRAM chiplets: DRAM supplies low-latency bandwidth for attention, while RRAM offers dense, non-volatile storage for weights. This heterogeneous hardware is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Big Data and Digital Economy
