# Bioinspired Optimization for Feature Selection in Post-Compliance Risk Prediction

**Authors:** Álex Paz, Broderick Crawford, Eric Monfroy, Eduardo Rodriguez-Tello, José Barrera-García, Felipe Cisternas-Caneo, Benjamín López Cortés, Yoslandy Lazo, Andrés Yáñez, Álvaro Peña Fritz, Ricardo Soto

PMC · DOI: 10.3390/biomimetics11030190 · Biomimetics · 2026-03-05

## TL;DR

This paper explores using bio-inspired optimization to improve risk prediction in administrative data by selecting key features while addressing class imbalance.

## Contribution

A wrapper-based metaheuristic feature selection framework is proposed, integrating swarm optimization with supervised classifiers for post-compliance risk prediction.

## Key findings

- Optimized configurations significantly improved minority-class recall for k-nearest neighbors and Random Forest.
- LightGBM showed stable high recall with optimized feature subsets, indicating dimensional compression.
- Optimized models retained 16–33 features from the original 76-variable space while maintaining performance.

## Abstract

Bio-inspired metaheuristic optimization offers flexible search mechanisms for high-dimensional predictive problems under operational constraints. In administrative risk prediction settings, class imbalance and feature redundancy challenge conventional learning pipelines. This study evaluates a wrapper-based metaheuristic feature selection framework for post-compliance income declaration prediction using real longitudinal administrative records. The proposed approach integrates swarm-inspired optimization with supervised classifiers under a weighted objective function jointly prioritizing minority-class recall and subset compactness. Robustness is assessed through 31 independent stochastic runs per configuration. The empirical results indicate that performance effects are learner-dependent. For variance-prone classifiers, substantial minority-class recall gains are observed, with recall increasing from 0.284 to 0.849 for k-nearest neighbors and from 0.471 to 0.932 for Random Forest under optimized configurations. For LightGBM, optimized models maintain high recall levels (0.935–0.943 on average) with low dispersion, suggesting representational stabilization and dimensional compression rather than large absolute recall improvements. Optimized subsets retain approximately 16–33 features on average from the original 76-variable space. Within the evaluated experimental protocol, the findings show that metaheuristic-driven wrapper feature selection can reshape predictive representations under class imbalance, enabling simultaneous control of minority-class performance and feature dimensionality. Formal institutional deployment and cross-domain generalization remain subjects for future investigation.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), Feature (OMIM:600512)
- **Chemicals:** GWO (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Megaptera novaeangliae (humpback whale, species) [taxon 9773]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13024445/full.md

## Figures

30 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13024445/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC13024445/full.md

---
Source: https://tomesphere.com/paper/PMC13024445