StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data
Victor Pellegrain (1, 2), Myriam Tami (2), Michel Batteux (1),, C\'eline Hudelot (2) ((1) Institut de Recherche Technologique SystemX, (2), Universit\'e Paris-Saclay, CentraleSup\'elec, MICS)

TL;DR
StreaMulT introduces a novel streaming multimodal transformer model designed to handle arbitrarily long, heterogeneous data streams for predictive maintenance, outperforming existing models on sentiment analysis benchmarks.
Contribution
This paper formalizes the streaming multimodal learning paradigm and proposes StreaMulT, a transformer-based model that processes long, unaligned multimodal data streams using cross-modal attention and memory.
Findings
Outperforms state-of-the-art on CMU-MOSEI sentiment analysis dataset
Handles arbitrarily long multimodal input sequences
Highlights the significance of textual embeddings in multimodal tasks
Abstract
The increasing complexity of Industry 4.0 systems brings new challenges regarding predictive maintenance tasks such as fault detection and diagnosis. A corresponding and realistic setting includes multi-source data streams from different modalities, such as sensors measurements time series, machine images, textual maintenance reports, etc. These heterogeneous multimodal streams also differ in their acquisition frequency, may embed temporally unaligned information and can be arbitrarily long, depending on the considered system and task. Whereas multimodal fusion has been largely studied in a static setting, to the best of our knowledge, there exists no previous work considering arbitrarily long multimodal streams alongside with related tasks such as prediction across time. Thus, in this paper, we first formalize this paradigm of heterogeneous multimodal learning in a streaming setting as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Anomaly Detection Techniques and Applications · Data Stream Mining Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Residual Connection · Adam · Label Smoothing · Byte Pair Encoding
