Task-Oriented Semantic Communication in Large Multimodal Models-based   Vehicle Networks

Baoxia Du; Hongyang Du; Dusit Niyato; and Ruidong Li

arXiv:2505.02413·cs.AI·May 6, 2025

Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks

Baoxia Du, Hongyang Du, Dusit Niyato, and Ruidong Li

PDF

TL;DR

This paper introduces a task-oriented semantic communication framework using large multimodal models for vehicle networks, optimizing resource use and improving accuracy in traffic scenario VQA tasks, especially under low SNR conditions.

Contribution

It proposes a novel LMM-based semantic communication framework with image slicing and attention-based optimization for vehicle AI assistants, enhancing efficiency and accuracy.

Findings

01

Accuracy improved by 13.4% at 12dB SNR

02

Accuracy improved by 33.1% at 10dB SNR

03

Effective resource utilization in traffic VQA scenarios

Abstract

Task-oriented semantic communication has emerged as a fundamental approach for enhancing performance in various communication scenarios. While recent advances in Generative Artificial Intelligence (GenAI), such as Large Language Models (LLMs), have been applied to semantic communication designs, the potential of Large Multimodal Models (LMMs) remains largely unexplored. In this paper, we investigate an LMM-based vehicle AI assistant using a Large Language and Vision Assistant (LLaVA) and propose a task-oriented semantic communication framework to facilitate efficient interaction between users and cloud servers. To reduce computational demands and shorten response time, we optimize LLaVA's image slicing to selectively focus on areas of utmost interest to users. Additionally, we assess the importance of image patches by combining objective and subjective user attention, adjusting energy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus