TL;DR
This paper introduces CRC-Screen, a hazard screening method for DNA synthesis that remains reliable under taxonomic shifts by using calibrated signals and conformal risk control.
Contribution
It presents a novel hazard screening approach that certifies low false-negative rates despite taxonomic gaps, leveraging multiple signals and conformal risk control.
Findings
Achieves 0% test miss rate on most folds at α=0.05
Certifies a maximum 1.77% false-negative rate with finite-sample guarantees
Requires larger calibration sets for more stringent thresholds
Abstract
DNA-synthesis providers screen incoming orders by searching the requested sequence against curated hazard lists. We show that this baseline collapses to a 100% false-flag rate when the hazardous sequence comes from a taxonomic family absent from the reference set: under Conformal Risk Control's certified miss-rate constraint, a low-discrimination signal forces the threshold below the entire test-benign mass. We compose three signals derived from a synthesis order's public annotation: -mer Jaccard similarity to known toxins, the trimmed-mean score of a five-LLM judge panel, and cosine similarity to clustered embedding centroids. Fused under a monotone logistic aggregator and calibrated by Conformal Risk Control, the resulting screener certifies . Across ten leave-one-taxonomic-family-out folds at on UniProt KW-0800 reviewed toxins,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
