Bond Default Prediction with Text Embeddings, Undersampling and Deep Learning
Luke Jordan

TL;DR
This paper presents a deep learning approach using text embeddings and oversampling to predict municipal bond defaults with high accuracy, significantly outperforming traditional models and human estimates.
Contribution
It introduces a novel combination of transformer-based text embeddings, neural networks, and synthetic oversampling for default prediction in highly imbalanced municipal bond data.
Findings
Predicts 90% of defaults at issuance without bond ratings.
Achieves significant performance improvement over traditional models.
Reduces false positives to less than 0.1% on non-defaulting bonds.
Abstract
The special and important problems of default prediction for municipal bonds are addressed using a combination of text embeddings from a pre-trained transformer network, a fully connected neural network, and synthetic oversampling. The combination of these techniques provides significant improvement in performance over human estimates, linear models, and boosted ensemble models, on data with extreme imbalance. Less than 0.2% of municipal bonds default, but our technique predicts 9 out of 10 defaults at the time of issue, without using bond ratings, at a cost of false positives on less than 0.1% non-defaulting bonds. The results hold the promise of reducing the cost of capital for local public goods, which are vital for society, and bring techniques previously used in personal credit and public equities (or national fixed income), as well as the current generation of embedding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCredit Risk and Financial Regulations · Financial Distress and Bankruptcy Prediction · Machine Learning in Healthcare
