VSLLaVA: a pipeline of large multimodal foundation model for industrial vibration signal analysis
Qi Li, Xinran Zhang, Jinfeng Huang, Hongliang He, Feibin Zhang, Zhaoye Qin, Fulei Chu

TL;DR
VSLLaVA is a specialized large multimodal model pipeline designed for industrial vibration signal analysis, leveraging expert knowledge, a novel dataset, and tailored training and evaluation methods to improve signal classification and fault detection.
Contribution
The paper introduces VSLLaVA, a comprehensive pipeline combining expert-guided instruction tuning, a novel dataset, and specialized evaluation for industrial vibration analysis, advancing domain-specific large multimodal models.
Findings
Significantly improves signal type identification accuracy.
Enhances fault-related signal analysis capabilities.
Demonstrates effective domain-specific model development.
Abstract
While Large Multimodal Models (LMMs) excel in general multimodal tasks, they lack the domain-specific knowledge for industrial vibration signal analysis. This paper introduces VSLLaVA, a comprehensive pipeline that utilizes expert knowledge-guided instruction tuning and evaluation to create an end-to-end LMM for signal analysis. To achieve this, we construct a novel Signal-Question-Answer (SQA) dataset using an expert rule-based signal generator. This dataset facilitates a two-stage learning procedure. The first step is efficient instruction fine-tuning with Low-Rank Adaptation (LoRA), which imparts specialized signal identification capabilities. Subsequently, we designed a tailored Group Relative Policy Optimization (GRPO) to refine the reasoning capabilities and enhance classification robustness. Then, a dual-mode evaluation framework is proposed, combining an LLM referee with expert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStructural Health Monitoring Techniques · Infrastructure Maintenance and Monitoring · Speech and Audio Processing
