Next-Year Bankruptcy Prediction from Textual Data: Benchmark and Baselines
Henri Arno, Klaas Mulier, Joke Baeck, Thomas Demeester

TL;DR
This paper establishes a benchmark dataset and evaluation framework for bankruptcy prediction using unstructured textual data, compares classical and neural models, and highlights the effectiveness of simple bag-of-words approaches.
Contribution
It introduces a standardized benchmark for textual bankruptcy prediction and evaluates baseline models, providing a foundation for future research in this area.
Findings
Lightweight bag-of-words model performs surprisingly well.
Using multi-year textual data improves prediction accuracy.
Evaluation of classical vs neural models reveals strengths and weaknesses.
Abstract
Models for bankruptcy prediction are useful in several real-world scenarios, and multiple research contributions have been devoted to the task, based on structured (numerical) as well as unstructured (textual) data. However, the lack of a common benchmark dataset and evaluation strategy impedes the objective comparison between models. This paper introduces such a benchmark for the unstructured data scenario, based on novel and established datasets, in order to stimulate further research into the task. We describe and evaluate several classical and neural baseline models, and discuss benefits and flaws of different strategies. In particular, we find that a lightweight bag-of-words model based on static in-domain word representations obtains surprisingly good results, especially when taking textual data from several years into account. These results are critically assessed, and discussed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Distress and Bankruptcy Prediction · Stock Market Forecasting Methods
