Forecasting Credit Ratings: A Case Study where Traditional Methods Outperform Generative LLMs
Felix Drinkall, Janet B. Pierrehumbert, Stefan Zohren

TL;DR
This study compares traditional machine learning methods with large language models for forecasting corporate credit ratings, finding that traditional methods outperform LLMs when integrating diverse data types.
Contribution
It provides a comprehensive evaluation showing that traditional models still outperform LLMs in credit rating prediction tasks involving multimodal data.
Findings
Traditional models outperform LLMs in credit rating forecasting.
LLMs excel at textual data but lag in multimodal data integration.
XGBoost with fundamental and macroeconomic data is highly effective.
Abstract
Large Language Models (LLMs) have been shown to perform well for many downstream tasks. Transfer learning can enable LLMs to acquire skills that were not targeted during pre-training. In financial contexts, LLMs can sometimes beat well-established benchmarks. This paper investigates how well LLMs perform in the task of forecasting corporate credit ratings. We show that while LLMs are very good at encoding textual information, traditional methods are still very competitive when it comes to encoding numeric and multimodal data. For our task, current LLMs perform worse than a more traditional XGBoost architecture that combines fundamental and macroeconomic data with high-density text-based embedding features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical and Computational Modeling · Modeling, Simulation, and Optimization · Economic and Technological Developments in Russia
