Cross-Modal Learning for Anomaly Detection in Complex Industrial Process: Methodology and Benchmark
Gaochang Wu, Yapeng Zhang, Lan Deng, Jingxin Zhang, Tianyou Chai

TL;DR
This paper introduces FmFormer, a cross-modal Transformer for anomaly detection in industrial processes, leveraging video and process variables to improve accuracy and robustness, validated on a large magnesium smelting benchmark.
Contribution
It presents a novel cross-modal Transformer architecture with a new tokenization paradigm for hierarchical anomaly detection and introduces a comprehensive benchmark dataset for fused magnesium smelting.
Findings
Achieves state-of-the-art anomaly detection accuracy.
Effectively handles visual occlusion and current fluctuations.
Provides a large-scale benchmark dataset for future research.
Abstract
Anomaly detection in complex industrial processes plays a pivotal role in ensuring efficient, stable, and secure operation. Existing anomaly detection methods primarily focus on analyzing dominant anomalies using the process variables (such as arc current) or constructing neural networks based on abnormal visual features, while overlooking the intrinsic correlation of cross-modal information. This paper proposes a cross-modal Transformer (dubbed FmFormer), designed to facilitate anomaly detection by exploring the correlation between visual features (video) and process variables (current) in the context of the fused magnesium smelting process. Our approach introduces a novel tokenization paradigm to effectively bridge the substantial dimensionality gap between the 3D video modality and the 1D current modality in a multiscale manner, enabling a hierarchical reconstruction of pixel-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
MethodsResidual Connection · Softmax · Layer Normalization · Focus · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention
