# Can News Predict the Direction of Oil Price Volatility? A Language Model Approach with SHAP Explanations

**Authors:** Romina Hashami, Felipe Maldonado

arXiv: 2508.20707 · 2025-08-29

## TL;DR

This paper investigates whether financial news alone can predict the direction of oil price volatility using language models and SHAP explanations, highlighting the importance of news features and interpretability in forecasting.

## Contribution

It introduces a novel ensemble learning framework utilizing multiple language models and SHAP for explainability to predict oil price movement direction solely from news data.

## Key findings

- Raw news count is a robust predictor of oil price direction.
- FastText embeddings outperform other language models in forecasting.
- SHAP analysis reveals evolving market drivers across different periods.

## Abstract

Financial markets can be highly sensitive to news, investor sentiment, and economic indicators, leading to important asset price fluctuations. In this study we focus on crude oil, due to its crucial role in commodity markets and the global economy. Specifically, we are interested in understanding the directional changes of oil price volatility, and for this purpose we investigate whether news alone -- without incorporating traditional market data -- can effectively predict the direction of oil price movements. Using a decade-long dataset from Eikon (2014-2024), we develop an ensemble learning framework to extract predictive signals from financial news. Our approach leverages diverse sentiment analysis techniques and modern language models, including FastText, FinBERT, Gemini, and LLaMA, to capture market sentiment and textual patterns. We benchmark our model against the Heterogeneous Autoregressive (HAR) model and assess statistical significance using the McNemar test. While most sentiment-based indicators do not consistently outperform HAR, the raw news count emerges as a robust predictor. Among embedding techniques, FastText proves most effective for forecasting directional movements. Furthermore, SHAP-based interpretation at the word level reveals evolving predictive drivers across market regimes: pre-pandemic emphasis on supply-demand and economic terms; early pandemic focus on uncertainty and macroeconomic instability; post-shock attention to long-term recovery indicators; and war-period sensitivity to geopolitical and regional oil market disruptions. These findings highlight the predictive power of news-driven features and the value of explainable NLP in financial forecasting.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20707/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20707/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/2508.20707/full.md

---
Source: https://tomesphere.com/paper/2508.20707