# Accurate total consumer price index forecasting with data augmentation, multivariate features, and sentiment analysis: A case study in Korea

**Authors:** Injae Seo, Minkyoung Kim, Jong Wook Kim, Beakcheol Jang

PMC · DOI: 10.1371/journal.pone.0321530 · PLOS One · 2025-05-13

## TL;DR

This paper introduces a new method for accurately forecasting the Consumer Price Index in Korea using advanced machine learning and data augmentation techniques.

## Contribution

A novel framework combining CNN-LSTM, multivariate features, data augmentation, and sentiment analysis for CPI forecasting.

## Key findings

- The proposed model achieves lower RMSE values compared to existing approaches.
- Data augmentation and sentiment analysis improve CPI prediction accuracy.
- Multivariate inputs enhance the model's understanding of CPI dynamics.

## Abstract

The Consumer Price Index (CPI) is a key economic indicator used by policymakers worldwide to monitor inflation and guide monetary policy decisions. In Korea, the CPI significantly impacts decisions on interest rates, fiscal policy frameworks, and the Bank of Korea’s strategies for economic stability. Given its importance, accurately forecasting the Total CPI is crucial for informed decision-making. Achieving accurate estimation, however, presents several challenges. First, the Korean Total CPI is calculated as a weighted sum of 462 items grouped into 12 categories of goods and services. This heterogeneity makes it difficult to account for all variations in consumer behavior and price dynamics. Second, the monthly frequency of CPI data results in a relatively sparse time series, limiting the performance of the analysis. Furthermore, external factors such as policy changes and pandemics add further volatility to the CPI. To address these challenges, we propose a novel framework consisting of four key components: (1) a hybrid Convolutional Neural Network-Long Short-Term Memory mechanism designed to capture complex patterns in CPI data, enhancing estimation accuracy; (2) multivariate inputs that incorporate CPI component indices alongside auxiliary variables for richer contextual information; (3) data augmentation through linear interpolation to convert monthly data into daily data, optimizing it for highly parametrized deep learning models; and (4) sentiment index derived from Korean CPI-related news articles, providing insights into external factors influencing CPI fluctuations. Experimental results demonstrate that the proposed model outperforms existing approaches in CPI prediction, as evidenced by lower RMSE values. This improved accuracy has the potential to support the development of more timely and effective economic policies.

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** COVID-19 (MESH:D000086382), CPI (MESH:C566784), infectious disease (MESH:D003141), DL (MESH:D007859)
- **Chemicals:** water (MESH:D014867)
- **Mutations:** L40S

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12074598/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12074598/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/PMC12074598/full.md

---
Source: https://tomesphere.com/paper/PMC12074598