FinMultiTime: A Four-Modal Bilingual Dataset for Financial Time-Series Analysis

Wenyan Xu; Dawei Xiang; Yue Liu; Xiyu Wang; Yanxiang Ma; Liang Zhang; Shu Hu; Chang Xu; Jiaheng Zhang

arXiv:2506.05019·cs.CE·September 12, 2025

FinMultiTime: A Four-Modal Bilingual Dataset for Financial Time-Series Analysis

Wenyan Xu, Dawei Xiang, Yue Liu, Xiyu Wang, Yanxiang Ma, Liang Zhang, Shu Hu, Chang Xu, Jiaheng Zhang

PDF

1 Repo 2 Datasets

TL;DR

FinMultiTime is a comprehensive, large-scale multimodal dataset for financial time-series analysis, integrating diverse data sources across markets to improve prediction accuracy and facilitate advanced research.

Contribution

This paper introduces FinMultiTime, the first large-scale, multimodal financial dataset aligning four different modalities across multiple markets and time resolutions.

Findings

01

Scale and data quality significantly improve prediction accuracy.

02

Multimodal fusion provides moderate gains in Transformer models.

03

A reproducible pipeline allows seamless dataset updates.

Abstract

Pure time series forecasting tasks typically focus exclusively on numerical features; however, real-world financial decision-making demands the comparison and analysis of heterogeneous sources of information. Recent advances in deep learning and large scale language models (LLMs) have made significant strides in capturing sentiment and other qualitative signals, thereby enhancing the accuracy of financial time series predictions. Despite these advances, most existing datasets consist solely of price series and news text, are confined to a single market, and remain limited in scale. In this paper, we introduce FinMultiTime, the first large scale, multimodal financial time series dataset. FinMultiTime temporally aligns four distinct modalities financial news, structured financial tables, K-line technical charts, and stock price time series across both the S&P 500 and HS 300 universes.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marigoldwu/pydgc
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.