# Integrated Multivariate Segmentation Tree for Heterogeneous Credit Data Analysis in Small- and Medium-Sized Enterprises

**Authors:** Lu Han, Xiuying Wang

arXiv: 2509.00550 · 2026-01-13

## TL;DR

This paper introduces the integrated multivariate segmentation tree (IMST), a novel framework that combines textual and financial data to enhance credit evaluation accuracy for SMEs, outperforming traditional models.

## Contribution

The paper presents a new integrated decision tree model that effectively incorporates textual and financial data, improving interpretability and accuracy in SME credit assessment.

## Key findings

- IMST achieved 88.9% accuracy on SME data.
- Outperformed baseline decision trees, SVMs, and neural networks.
- Demonstrated improved interpretability and efficiency.

## Abstract

Traditional decision tree models, which rely exclusively on numerical variables, often face challenges in handling high-dimensional data and are limited in their ability to incorporate textual information effectively. To address these limitations, we propose the integrated multivariate segmentation tree (IMST), a comprehensive framework designed to improve credit evaluation for small- and medium-sized enterprises (SMEs) by integrating financial data with textual sources. This method comprises three core stages: (1) transforming textual data into numerical matrices through matrix factorization, (2) selecting salient financial features using Lasso regression, and (3) constructing a multivariate segmentation tree based on either the Gini index or entropy, with weakest-link pruning applied to control model complexity. Experimental results based on a dataset of 1,428 Chinese SMEs demonstrated that IMST achieved an accuracy rate of 88.9%, surpassing both baseline decision trees (87.4%) and conventional models such as support vector machines and neural networks. Furthermore, the proposed model demonstrated superior interpretability and computational efficiency, featuring a more streamlined architecture and improved risk detection capabilities.

---
Source: https://tomesphere.com/paper/2509.00550