Towards Unified Multimodal Financial Forecasting: Integrating Sentiment Embeddings and Market Indicators via Cross-Modal Attention
Sarthak Khanna, Armin Berger, David Berghaus, Tobias Deusser, Lorenz Sparrenberg, Rafet Sifa

TL;DR
This paper introduces STONK, a multimodal framework that combines numerical market data and sentiment-enriched news embeddings using cross-modal attention to enhance daily stock-movement prediction.
Contribution
It presents a novel unified approach integrating textual and numerical data with attention mechanisms for improved financial forecasting.
Findings
STONK outperforms numeric-only baselines in backtesting.
Fusion strategies significantly impact prediction accuracy.
The framework provides scalable solutions for multimodal financial analysis.
Abstract
We propose STONK (Stock Optimization using News Knowledge), a multimodal framework integrating numerical market indicators with sentiment-enriched news embeddings to improve daily stock-movement prediction. By combining numerical & textual embeddings via feature concatenation and cross-modal attention, our unified pipeline addresses limitations of isolated analyses. Backtesting shows STONK outperforms numeric-only baselines. A comprehensive evaluation of fusion strategies and model configurations offers evidence-based guidance for scalable multimodal financial forecasting. Source code is available on GitHub
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
