Using four different online media sources to forecast the crude oil price
M. Elshendy, A. Fronzetti Colladon, E. Battistoni, P. A. Gloor

TL;DR
This paper investigates how signals from Twitter, Google Trends, Wikipedia, and GDELT can be combined to improve crude oil price forecasting using semantic analysis and ARIMAX models over two years.
Contribution
It introduces a multi-platform approach that integrates four online media sources and multiple language features for more accurate oil price prediction.
Findings
Twitter language complexity is highly predictive.
GDELT articles significantly forecast price movements.
Combined media analysis improves forecasting accuracy.
Abstract
This study looks for signals of economic awareness on online social media and tests their significance in economic predictions. The study analyses, over a period of two years, the relationship between the West Texas Intermediate daily crude oil price and multiple predictors extracted from Twitter, Google Trends, Wikipedia, and the Global Data on Events, Language, and Tone database (GDELT). Semantic analysis is applied to study the sentiment, emotionality and complexity of the language used. Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) models are used to make predictions and to confirm the value of the study variables. Results show that the combined analysis of the four media platforms carries valuable information in making financial forecasting. Twitter language complexity, GDELT number of articles and Wikipedia page reads have the highest predictive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarket Dynamics and Volatility · Advanced Text Analysis Techniques · Machine Learning in Materials Science
