# Advanced spectral clustering for heterogeneous data in credit risk monitoring systems

**Authors:** Lu Han, Mengyan Li, Jiping Qiang, Zhi Su

arXiv: 2509.00546 · 2025-09-03

## TL;DR

This paper introduces Advanced Spectral Clustering (ASC), a novel method for effectively clustering heterogeneous financial and textual data in credit risk monitoring, improving cluster quality and providing actionable insights.

## Contribution

We develop ASC, which integrates financial and textual similarities with an eigenvalue-silhouette optimization, advancing spectral clustering for heterogeneous data in credit risk analysis.

## Key findings

- ASC achieves 18% higher Silhouette score than baseline.
- 51% of low-risk firms include 'social recruitment' in text.
- ASC's robustness confirmed across multiple clustering algorithms.

## Abstract

Heterogeneous data, which encompass both numerical financial variables and textual records, present substantial challenges for credit monitoring. To address this issue, we propose Advanced Spectral Clustering (ASC), a method that integrates financial and textual similarities through an optimized weight parameter and selects eigenvectors using a novel eigenvalue-silhouette optimization approach. Evaluated on a dataset comprising 1,428 small and medium-sized enterprises (SMEs), ASC achieves a Silhouette score that is 18% higher than that of a single-type data baseline method. Furthermore, the resulting clusters offer actionable insights; for instance, 51% of low-risk firms are found to include the term 'social recruitment' in their textual records. The robustness of ASC is confirmed across multiple clustering algorithms, including k-means, k-medians, and k-medoids, with {\Delta}Intra/Inter < 0.13 and {\Delta}Silhouette Coefficient < 0.02. By bridging spectral clustering theory with heterogeneous data applications, ASC enables the identification of meaningful clusters, such as recruitment-focused SMEs exhibiting a 30% lower default risk, thereby supporting more targeted and effective credit interventions.

---
Source: https://tomesphere.com/paper/2509.00546