CSMD: Curated Multimodal Dataset for Chinese Stock Analysis
Yu Liu, Zhuoying Li, Ruifeng Yang, Fengran Mo, Cen Chen

TL;DR
This paper introduces CSMD, a high-quality multimodal dataset tailored for Chinese stock market analysis, along with LightQuant, a framework that enhances stock prediction accuracy using this dataset.
Contribution
The paper presents a curated multimodal dataset specifically for Chinese stocks and a lightweight framework that improves analysis effectiveness over existing resources.
Findings
Demonstrated improved prediction accuracy with the new dataset and framework.
Validated the quality and applicability of CSMD for Chinese stock analysis.
Showed superiority over existing datasets in experimental results.
Abstract
The stock market is a complex and dynamic system, where it is non-trivial for researchers and practitioners to uncover underlying patterns and forecast stock movements. The existing studies for stock market analysis rely on leveraging various types of information to extract useful factors, which are highly conditional on the quality of the data used. However, the currently available resources are mainly based on the U.S. stock market in English, which is inapplicable to adapt to other countries. To address these issues, we propose CSMD, a multimodal dataset curated specifically for analyzing the Chinese stock market with meticulous processing for validated quality. In addition, we develop a lightweight and user-friendly framework LightQuant for researchers and practitioners with expertise in financial domains. Experimental results on top of our datasets and framework with various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Time Series Analysis and Forecasting · Complex Systems and Time Series Analysis
