Multidimensional Scaling for Gene Sequence Data with Autoencoders

Pulasthi Wickramasinghe; Geoffrey Fox

arXiv:2104.09014·cs.AI·April 20, 2021

Multidimensional Scaling for Gene Sequence Data with Autoencoders

Pulasthi Wickramasinghe, Geoffrey Fox

PDF

TL;DR

This paper introduces an autoencoder-based dimensionality reduction model for gene sequence data that scales efficiently to large datasets and achieves high accuracy, comparable to traditional MDS methods.

Contribution

The paper presents a novel autoencoder-based model for multidimensional scaling of gene sequences that is scalable and resource-efficient, outperforming existing algorithms in large datasets.

Findings

01

Scales to millions of gene sequences with minimal resources

02

Achieves over 99.5% accuracy on out-of-sample data

03

Comparable results to state-of-the-art MDS algorithms

Abstract

Multidimensional scaling of gene sequence data has long played a vital role in analysing gene sequence data to identify clusters and patterns. However the computation complexities and memory requirements of state-of-the-art dimensional scaling algorithms make it infeasible to scale to large datasets. In this paper we present an autoencoder-based dimensional reduction model which can easily scale to datasets containing millions of gene sequences, while attaining results comparable to state-of-the-art MDS algorithms with minimal resource requirements. The model also supports out-of-sample data points with a 99.5%+ accuracy based on our experiments. The proposed model is evaluated against DAMDS with a real world fungi gene sequence dataset. The presented results showcase the effectiveness of the autoencoder-based dimension reduction model and its advantages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.