FaultGPT: Industrial Fault Diagnosis Question Answering System by Vision Language Models
Jiao Chen, Ruyi Huang, Zuohong Lv, Jianhua Tang, and Weihua Li

TL;DR
FaultGPT is a novel multimodal vision-language model that generates detailed fault diagnosis reports directly from raw vibration signals, improving accuracy and adaptability in complex industrial systems.
Contribution
The paper introduces FaultGPT, a large vision-language model trained on a new FDQA dataset, enabling end-to-end fault diagnosis report generation from vibration data without extra training parameters.
Findings
FaultGPT outperforms traditional methods in report quality and accuracy.
The model demonstrates strong few-shot and zero-shot capabilities.
Extensive experiments validate its effectiveness across multiple datasets.
Abstract
Recently, employing single-modality large language models based on mechanical vibration signals as Tuning Predictors has introduced new perspectives in intelligent fault diagnosis. However, the potential of these methods to leverage multimodal data remains underexploited, particularly in complex mechanical systems where relying on a single data source often fails to capture comprehensive fault information. In this paper, we present FaultGPT, a novel model that generates fault diagnosis reports directly from raw vibration signals. By leveraging large vision-language models (LVLM) and text-based supervision, FaultGPT performs end-to-end fault diagnosis question answering (FDQA), distinguishing itself from traditional classification or regression approaches. Specifically, we construct a large-scale FDQA instruction dataset for instruction tuning of LVLM. This dataset includes vibration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
