TL;DR
This paper introduces a novel structural DNA representation that improves the accuracy of identifying functional DNA regions and variants, outperforming traditional sequence-based methods in discriminating DNA substrates.
Contribution
The authors developed a compact structural encoding of DNA that captures key properties of protein-DNA interactions, enhancing sequence-based algorithms for regulatory region analysis.
Findings
Structural representations compress functional DNA variants by 30-50%.
Structural distance metrics outperform sequence-based metrics in discriminating DNA groups.
A new alignment algorithm demonstrates improved accuracy using structural DNA representations.
Abstract
The nucleotide sequence representation of DNA can be inadequate for resolving protein-DNA binding sites and regulatory substrates, such as those involved in gene expression and horizontal gene transfer. Considering that sequence-like representations are algorithmically very useful, here we fused over 60 currently available DNA physicochemical and conformational variables into compact structural representations that can encode single DNA binding sites to whole regulatory regions. We find that the main structural components reflect key properties of protein-DNA interactions and can be condensed to the amount of information found in a single nucleotide position. The most accurate structural representations compress functional DNA sequence variants by 30% to 50%, as each instance encodes from tens to thousands of sequences. We show that a structural distance function discriminates among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
