Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup
Maolin Wang, Yao Zhao, Jiajia Liu, Jingdong Chen, Chenyi Zhuang,, Jinjie Gu, Ruocheng Guo, Xiangyu Zhao

TL;DR
This paper presents a multi-stage compression method for large multimodal models that significantly reduces latency and energy consumption while maintaining performance, enabling more sustainable deployment in real-world applications.
Contribution
It introduces a novel multi-stage pruning and distillation strategy tailored for proprietary LLMs, validated on real-world multimodal advertisement data.
Findings
Latency reduced from 700ms to 90ms
Estimated annual energy savings of 75 million kWh
Successful deployment in Alipay for three months
Abstract
The deployment of Large Multimodal Models (LMMs) within AntGroup has significantly advanced multimodal tasks in payment, security, and advertising, notably enhancing advertisement audition tasks in Alipay. However, the deployment of such sizable models introduces challenges, particularly in increased latency and carbon emissions, which are antithetical to the ideals of Green AI. This paper introduces a novel multi-stage compression strategy for our proprietary LLM, AntGMM. Our methodology pivots on three main aspects: employing small training sample sizes, addressing multi-level redundancy through multi-stage pruning, and introducing an advanced distillation loss design. In our research, we constructed a dataset, the Multimodal Advertisement Audition Dataset (MAAD), from real-world scenarios within Alipay, and conducted experiments to validate the reliability of our proposed strategy.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Topic Modeling · Multimodal Machine Learning Applications
