Feature-to-Image Data Augmentation: Improving Model Feature Extraction with Cluster-Guided Synthetic Samples
Yasaman Haghbin, Hadi Moradi, Reshad Hosseini

TL;DR
This paper presents FICAug, a feature-to-image data augmentation method that enhances model generalization in low-resource settings by generating structured synthetic samples through clustering and neural network projection.
Contribution
FICAug introduces a novel framework combining feature clustering, Gaussian sampling, and neural network projection to improve data augmentation and model performance.
Findings
Achieved 84.09% cross-validation accuracy with feature augmentation.
Boosted ResNet-18 accuracy to 88.63% with reconstructed images.
Significantly improved classification performance in limited data scenarios.
Abstract
One of the growing trends in machine learning is the use of data generation techniques, since the performance of machine learning models is dependent on the quantity of the training dataset. However, in many real-world applications, particularly in medical and low-resource domains, collecting large datasets is challenging due to resource constraints, which leads to overfitting and poor generalization. This study introduces FICAug, a novel feature-to-image data augmentation framework designed to improve model generalization under limited data conditions by generating structured synthetic samples. FICAug first operates in the feature space, where original data are clustered using the k-means algorithm. Within pure-label clusters, synthetic data are generated through Gaussian sampling to increase diversity while maintaining label consistency. These synthetic features are then projected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Big Data Technologies and Applications
