Ranking labs-of-origin for genetically engineered DNA using Metric Learning
I. Muniz, F. H. F. Camargo, A. Marques

TL;DR
This paper presents a metric learning approach to rank labs-of-origin for genetically engineered DNA, outperforming traditional methods and providing versatile embeddings for various genetic attribution tasks.
Contribution
The authors introduce a novel metric learning method that generates embeddings for DNA sequences and labs, improving ranking accuracy and enabling multiple downstream applications.
Findings
Outperforms classic training methods in lab-of-origin ranking
Generates useful embeddings for clustering and feature extraction
Supports multiple genetic attribution tasks
Abstract
With the constant advancements of genetic engineering, a common concern is to be able to identify the lab-of-origin of genetically engineered DNA sequences. For that reason, AltLabs has hosted the genetic Engineering Attribution Challenge to gather many teams to propose new tools to solve this problem. Here we show our proposed method to rank the most likely labs-of-origin and generate embeddings for DNA sequences and labs. These embeddings can also perform various other tasks, like clustering both DNA sequences and labs and using them as features for Machine Learning models applied to solve other problems. This work demonstrates that our method outperforms the classic training method for this task while generating other helpful information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSARS-CoV-2 detection and testing · Biomedical and Engineering Education · Image Processing Techniques and Applications
