# Reinforcement learning-driven feature selection enhanced by an evolutionary approach tuning for criminal suspect identification

**Authors:** Zhenming Gao, Zhang Jian, Seyed Jalaleddin Mousavirad

PMC · DOI: 10.1038/s41598-025-25920-6 · 2025-11-25

## TL;DR

This paper introduces a new AI method combining reinforcement learning and evolutionary algorithms to improve criminal suspect identification using facial data.

## Contribution

A novel approach using Off-policy PPO for feature selection and differential evolution with k-means mutation for hyperparameter tuning in suspect identification.

## Key findings

- The proposed method achieved F-measures up to 92.202% on the VGGFace2 dataset.
- Off-policy PPO improved feature selection and class balance compared to conventional methods.
- The enhanced differential evolution algorithm with k-means mutation improved hyperparameter tuning effectiveness.

## Abstract

Accurate identification of criminal suspects is crucial for ensuring justice and deterring future crimes. Convolutional neural networks (CNNs) are frequently used to identify suspects. However, conventional methods that rely on CNNs often require assistance with feature selection (FS), class imbalance, and hyperparameter tuning, thereby diminishing their overall effectiveness. To overcome these obstacles, this study introduces a strategy based on reinforcement learning (RL), specifically off-policy proximal policy optimization (Off-policy PPO), which addresses FS and class imbalance. This approach is supplemented by a sophisticated differential evolution (DE) algorithm for tuning hyperparameters. We select Off-policy PPO because it reduces data needs, increases RL efficiency, and suits settings where data collection is costly. In our research, Off-policy PPO is dynamically tuned to improve FS and class balance. It consistently surpasses conventional static approaches by refining its approach to the intricate dynamics of criminal suspect detection. Furthermore, the DE algorithm is enhanced with a novel mutation strategy that employs k-means clustering to effectively identify key clusters. Our methodology is evaluated using four distinct datasets: the CelebFaces Attributes (CelebA), Labeled Faces in the Wild (LFW), Chinese Academy of Sciences Institute of Automation WebFace (CASIA-WebFace), and Visual Geometry Group Face 2 (VGGFace2) datasets. The experimental outcomes are remarkable, achieving F-measures of 89.409%, 91.152%, 92.184%, and 92.202%, respectively. These results demonstrate that the approach outperforms existing methods and advances early suspect detection, while also improving investigative strategies.

## Full-text entities

- **Genes:** TDO2 (tryptophan 2,3-dioxygenase) [NCBI Gene 6999] {aka HYPTRP, TDO, TO, TPH2, TRPO}
- **Diseases:** DT (MESH:D020195), DL (MESH:D007859)
- **Chemicals:** CelebA (-), ACO (MESH:C034482)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12647716/full.md

---
Source: https://tomesphere.com/paper/PMC12647716