IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer
Changheon Han, Yuseop Sim, Hoin Jung, Jiho Lee, Hojun Lee, Yun Seok Kang, Sucheol Woo, Garam Kim, Hyung Wook Park, Martin Byung-Guk Jun

TL;DR
This paper introduces IMPACT, a self-supervised transformer model trained on a large industrial audio dataset, improving machine sound analysis and outperforming existing methods across diverse industrial audio tasks.
Contribution
The paper presents a new large-scale industrial audio dataset, DINOS, and a novel foundation model, IMPACT, for industrial machine sound analysis, with superior performance on multiple downstream tasks.
Findings
IMPACT outperforms existing models on 24 of 30 tasks
DINOS dataset contains over 74,000 audio samples from industrial scenarios
IMPACT effectively captures both global and fine-grained audio features
Abstract
Acoustic signals from industrial machines offer valuable insights for anomaly detection, predictive maintenance, and operational efficiency enhancement. However, existing task-specific, supervised learning methods often scale poorly and fail to generalize across diverse industrial scenarios, whose acoustic characteristics are distinct from general audio. Furthermore, the scarcity of accessible, large-scale datasets and pretrained models tailored for industrial audio impedes community-driven research and benchmarking. To address these challenges, we introduce DINOS (Diverse INdustrial Operation Sounds), a large-scale open-access dataset. DINOS comprises over 74,149 audio samples (exceeding 1,093 hours) collected from various industrial acoustic scenarios. We also present IMPACT (Industrial Machine Perception via Acoustic Cognitive Transformer), a novel foundation model for industrial…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper is unique and interesting. The paper is written well and contains detailed experiments. The self-supervised model IMPACT, trained on the proposed data, achieves the best performance across the majority of the tasks.
The paper has limited novelty. The primary contribution of the paper is the dataset; the IMPACT model is based on a well-known existing self-supervised model, EAT.
- Paper is well written (except some minor grammatical errors. Authors, please recheck for missing spaces and punctuation.) - A comprehensive benchmarking setup, with distinct pretraining and downstream benchmarking sets is provided. - Limited availability of public, large-scale corpora is a major pain point in manufacturing and floor monitoring, so the dataset could indeed prove invaluable to the community. - Evaluation, to the extent done in the paper, is good.
- Based on the results alone, it is hard to say how useful the proposed dataset is over the publicly available DCASE2025 Challenge Task 2 dataset for pretraining.
1. The collection of DINOS is an earnest effort. DINOS consists of the signals collected from both a microphone and a stethoscope, and covers various types of equipment. 2. The authors evaluated the performance of various off-the-shelf pretrained models on DINOS.
1. The evaluation is critically insufficient and cannot show the superiority of IMPACT. The authors did not apply other pretraining methods (e.g., AudioMAE) on DINOS. They only evaluated the off-the-shelf pretrained models (e.g., a model pretrained using AudioMAE method on other acoustic datasets) on DINOS. Since IMPACT is a pretraining method, if the authors want to show the superiority of IMPACT, they need to **pretrain** IMPACT and other pretraining methods (e.g., AudioMAE) **on the same data
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Machine Fault Diagnosis Techniques
