Multimodal Temporal Fusion Transformers Are Good Product Demand Forecasters
Maarten Sukel, Stevan Rudinac, Marcel Worring

TL;DR
This paper introduces a multimodal transformer-based architecture that integrates visual, textual, and contextual data to improve product demand forecasting accuracy, addressing limitations of traditional methods like cold start and category dynamics.
Contribution
It proposes a novel multimodal demand forecasting model combining convolutional, graph-based, and transformer architectures, outperforming traditional approaches on real-world data.
Findings
Enhanced demand prediction accuracy across diverse products
Effective handling of cold start and category dynamics issues
Demonstrated superiority over traditional methods on large datasets
Abstract
Multimodal demand forecasting aims at predicting product demand utilizing visual, textual, and contextual information. This paper proposes a method for multimodal product demand forecasting using convolutional, graph-based, and transformer-based architectures. Traditional approaches to demand forecasting rely on historical demand, product categories, and additional contextual information such as seasonality and events. However, these approaches have several shortcomings, such as the cold start problem making it difficult to predict product demand until sufficient historical data is available for a particular product, and their inability to properly deal with category dynamics. By incorporating multimodal information, such as product images and textual descriptions, our architecture aims to address the shortcomings of traditional approaches and outperform them. The experiments conducted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Chemical Sensor Technologies
