Addressing Out-of-Distribution Challenges in Image Semantic Communication Systems with Multi-modal Large Language Models
Feifan Zhang, Yuyang Du, Kexin Chen, Yulin Shao, Soung Chang Liew

TL;DR
This paper introduces a novel framework using multi-modal large language models to improve semantic communication in wireless networks, especially addressing out-of-distribution challenges through Bayesian optimization and cooperative inference.
Contribution
It proposes a new 'Plan A - Plan B' framework and a Bayesian optimization scheme to enhance MLLM performance in semantic encoding and reconstruction tasks.
Findings
Significant improvement in semantic compression performance.
Effective filtering of irrelevant vocabulary during inference.
Enhanced image reconstruction reliability through cooperative MLLMs.
Abstract
Semantic communication is a promising technology for next-generation wireless networks. However, the out-of-distribution (OOD) problem, where a pre-trained machine learning (ML) model is applied to unseen tasks that are outside the distribution of its training data, may compromise the integrity of semantic compression. This paper explores the use of multi-modal large language models (MLLMs) to address the OOD issue in image semantic communication. We propose a novel "Plan A - Plan B" framework that leverages the broad knowledge and strong generalization ability of an MLLM to assist a conventional ML model when the latter encounters an OOD input in the semantic encoding process. Furthermore, we propose a Bayesian optimization scheme that reshapes the probability distribution of the MLLM's inference process based on the contextual information of the image. The optimization scheme…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques
