# Distance metric learning based on structural neighborhoods for   dimensionality reduction and classification performance improvement

**Authors:** Mostafa Razavi Ghods, Mohammad Hossein Moattar, Yahya Forghani

arXiv: 1902.03453 · 2020-02-21

## TL;DR

This paper introduces a novel distance metric learning method that preserves local structures and addresses data imbalance by leveraging low-dimensional manifolds, improving classification performance especially on imbalanced datasets.

## Contribution

The proposed approach combines manifold learning with local neighborhood preservation to enhance metric learning in imbalanced datasets, a novel integration in the field.

## Key findings

- Outperforms existing methods on UCI datasets.
- Shows significant improvement on highly imbalanced KDDCup98 dataset.
- Effectively preserves local structures while handling data imbalance.

## Abstract

Distance metric learning can be viewed as one of the fundamental interests in pattern recognition and machine learning, which plays a pivotal role in the performance of many learning methods. One of the effective methods in learning such a metric is to learn it from a set of labeled training samples. The issue of data imbalance is the most important challenge of recent methods. This research tries not only to preserve the local structures but also covers the issue of imbalanced datasets. To do this, the proposed method first tries to extract a low dimensional manifold from the input data. Then, it learns the local neighborhood structures and the relationship of the data points in the ambient space based on the adjacencies of the same data points on the embedded low dimensional manifold. Using the local neighborhood relationships extracted from the manifold space, the proposed method learns the distance metric in a way which minimizes the distance between similar data and maximizes their distance from the dissimilar data points. The evaluations of the proposed method on numerous datasets from the UCI repository of machine learning, and also the KDDCup98 dataset as the most imbalance dataset, justify the supremacy of the proposed approach in comparison with other approaches especially when the imbalance factor is high.

---
Source: https://tomesphere.com/paper/1902.03453