A Multi-Task Ensemble Strategy for Gene Selection and Cancer Classification
Suli Lin, Zhizhe Lin, Jin Zhang, Man-Fai Leung

TL;DR
This paper introduces a new method for selecting important genes and classifying cancer types using gene expression data, improving accuracy and consistency.
Contribution
A novel multi-task ensemble strategy that combines gene selection and classification with ℓ2,1 regularization for improved stability and performance.
Findings
The method outperforms baseline methods in classification accuracy on real gene expression datasets.
Selected genes show higher consistency across tasks compared to existing methods.
The framework supports integration with standard classifiers like logistic regression and SVMs.
Abstract
Gene expression-based tumor classification aims to distinguish tumor types based on gene expression profiles. This task is difficult due to the high dimensionality of gene expression data and limited sample sizes. Most datasets contain tens of thousands of genes but only a small number of samples. As a result, selecting informative genes is necessary to improve classification performance and model interpretability. Many existing gene selection methods fail to produce stable and consistent results, especially when training data are limited. To address this, we propose a multi-task ensemble strategy that combines repeated sampling with joint feature selection and classification. The method generates multiple training subsets and applies multi-task logistic regression with ℓ2,1 group sparsity regularization to select a subset of genes that appears consistently across tasks. This promotes…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks
