Improving Omics-Based Classification: The Role of Feature Selection and Synthetic Data Generation
Diego Perazzolo, Pietro Fanton, Ilaria Barison, Marny Fedrigo,, Annalisa Angelini, Chiara Castellani, Enrico Grisan

TL;DR
This paper introduces a machine learning framework that combines feature selection and synthetic data generation to improve classification accuracy and interpretability in high-dimensional, small-sample omics datasets, demonstrating enhanced generalization.
Contribution
The study proposes a novel integrated pipeline that leverages feature selection and data augmentation to boost interpretability and performance in omics classification tasks with limited samples.
Findings
Synthetic data improves model generalization.
Feature selection enhances interpretability.
Pipeline maintains performance on larger test sets.
Abstract
Given the increasing complexity of omics datasets, a key challenge is not only improving classification performance but also enhancing the transparency and reliability of model decisions. Effective model performance and feature selection are fundamental for explainability and reliability. In many cases, high dimensional omics datasets suffer from limited number of samples due to clinical constraints, patient conditions, phenotypes rarity and others conditions. Current omics based classification models often suffer from narrow interpretability, making it difficult to discern meaningful insights where trust and reproducibility are critical. This study presents a machine learning based classification framework that integrates feature selection with data augmentation techniques to achieve high standard classification accuracy while ensuring better interpretability. Using the publicly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies
MethodsFeature Selection
