# A Multi-Task Ensemble Strategy for Gene Selection and Cancer Classification

**Authors:** Suli Lin, Zhizhe Lin, Jin Zhang, Man-Fai Leung

PMC · DOI: 10.3390/bioengineering12111245 · 2025-11-13

## TL;DR

This paper introduces a new method for selecting important genes and classifying cancer types using gene expression data, improving accuracy and consistency.

## Contribution

A novel multi-task ensemble strategy that combines gene selection and classification with ℓ2,1 regularization for improved stability and performance.

## Key findings

- The method outperforms baseline methods in classification accuracy on real gene expression datasets.
- Selected genes show higher consistency across tasks compared to existing methods.
- The framework supports integration with standard classifiers like logistic regression and SVMs.

## Abstract

Gene expression-based tumor classification aims to distinguish tumor types based on gene expression profiles. This task is difficult due to the high dimensionality of gene expression data and limited sample sizes. Most datasets contain tens of thousands of genes but only a small number of samples. As a result, selecting informative genes is necessary to improve classification performance and model interpretability. Many existing gene selection methods fail to produce stable and consistent results, especially when training data are limited. To address this, we propose a multi-task ensemble strategy that combines repeated sampling with joint feature selection and classification. The method generates multiple training subsets and applies multi-task logistic regression with ℓ2,1 group sparsity regularization to select a subset of genes that appears consistently across tasks. This promotes stability and reduces redundancy. The framework supports integration with standard classifiers such as logistic regression and support vector machines. It performs both gene selection and classification in a single process. We evaluate the method on simulated and real gene expression datasets. The results show that it outperforms several baseline methods in classification accuracy and the consistency of selected genes.

## Full-text entities

- **Diseases:** Cancer (MESH:D009369)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12650736/full.md

---
Source: https://tomesphere.com/paper/PMC12650736