Transformer Based Machine Fault Detection From Audio Input
Kiran Voderhobli Holla

TL;DR
This paper explores the use of transformer-based models for machine fault detection from audio data, showing their potential advantages over traditional CNN approaches.
Contribution
It demonstrates the effectiveness of transformer architectures in analyzing sound data for machine fault detection and compares their embeddings with CNNs.
Findings
Transformer models outperform CNNs in spectrogram analysis for fault detection.
Transformers generate more relevant embeddings for machine failure prediction.
Lower inductive biases in transformers lead to better performance with sufficient data.
Abstract
In recent years, Sound AI is being increasingly used to predict machine failures. By attaching a microphone to the machine of interest, one can get real time data on machine behavior from the field. Traditionally, Convolutional Neural Net (CNN) architectures have been used to analyze spectrogram images generated from the sounds captured and predict if the machine is functioning as expected. CNN architectures seem to work well empirically even though they have biases like locality and parameter-sharing which may not be completely relevant for spectrogram analysis. With the successful application of transformer-based models in the field of image processing starting with Vision Transformer (ViT) in 2020, there has been significant interest in leveraging these in the field of Sound AI. Since transformer-based architectures have significantly lower inductive biases, they are expected to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
