Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models
Muhammad Shafique, Abdul Basit, Muhammad Abdullah Hanif, Alberto Marchisio, Rachmad Vidya Wicaksana Putra, Minghao Shao

TL;DR
This paper introduces a comprehensive hardware-software co-design methodology to accelerate multimodal foundation models, reducing computational costs and improving efficiency through various optimization techniques and specialized hardware.
Contribution
It presents a novel multi-layered approach combining compression, operation optimization, and hardware acceleration specifically tailored for multimodal foundation models.
Findings
Effective MFM compression via mixed-precision quantization and pruning.
Enhanced model performance through speculative decoding and cascading.
Demonstrated improvements on medical-MFMs and code generation tasks.
Abstract
This work presents a multi-layered methodology for efficiently accelerating multimodal foundation models (MFMs). It combines hardware and software co-design of transformer blocks with an optimization pipeline that reduces computational and memory requirements. During model development, it employs performance enhancements through fine-tuning for domain-specific adaptation. Our methodology further incorporates hardware and software techniques for optimizing MFMs. Specifically, it employs MFM compression using hierarchy-aware mixed-precision quantization and structural pruning for transformer blocks and MLP channels. It also optimizes operations through speculative decoding, model cascading that routes queries through a small-to-large cascade and uses lightweight self-tests to determine when to escalate to larger models, as well as co-optimization of sequence length, visual resolution &…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
