SynVectorDB: embedding-based retrieval system for synthetic biology parts
Hao Li, Jiani Hu, Jie Song, Wei Zhou

TL;DR
SynVectorDB is a new system that helps scientists find biological parts more efficiently using advanced search and organization methods.
Contribution
A novel three-level classification system and embedding-based semantic search for biological parts.
Findings
SynVectorDB integrates 19,850 biological parts from multiple sources with systematic curation.
BGE-M3 embeddings in a vector database significantly improve semantic search over keyword methods.
The system offers cloud-based and open-source deployment options with SBOL3 compatibility.
Abstract
Synthetic biology part discovery faces significant challenges due to inconsistent data organization and limited semantic search capabilities across existing repositories. We developed SynVectorDB, an embedding-based retrieval system that addresses these limitations through methodological innovations in data integration and AI-driven semantic search. Our approach integrates 19 850 biological parts from multiple sources (Addgene, iGEM Registry, laboratory collections), implementing systematic curation protocols that resulted in 7656 parts achieving verified status through literature-based validation and reliability assessment. We introduce a novel three-level hierarchical classification system organizing parts into functionally coherent categories (DNA Elements, RNA Elements, Coding Sequences, and Application Constructs) with detailed subcategorization. The core technical contribution…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Genomics and Phylogenetic Studies · Gene expression and cancer classification
