Dilated Convolutions for Modeling Long-Distance Genomic Dependencies

Ankit Gupta; Alexander M. Rush

arXiv:1710.01278·q-bio.GN·October 4, 2017·1 cites

Dilated Convolutions for Modeling Long-Distance Genomic Dependencies

Ankit Gupta, Alexander M. Rush

PDF

Open Access 1 Repo

TL;DR

This paper introduces dilated convolutional neural networks to effectively model long-distance dependencies in genomic data, improving the detection of regulatory elements across large DNA regions.

Contribution

The paper develops a novel dataset for large-context genomic modeling and demonstrates the effectiveness of dilated convolutions over standard methods in capturing long-range interactions.

Findings

01

Dilated convolutions outperform standard convolutions and RNNs in locating regulatory markers.

02

Dilated CNNs effectively model long-distance genomic dependencies.

03

New dataset enables large-context genomic analysis.

Abstract

We consider the task of detecting regulatory elements in the human genome directly from raw DNA. Past work has focused on small snippets of DNA, making it difficult to model long-distance dependencies that arise from DNA's 3-dimensional conformation. In order to study long-distance dependencies, we develop and release a novel dataset for a larger-context modeling task. Using this new data set we model long-distance interactions using dilated convolutional neural networks, and compare them to standard convolutions and recurrent neural networks. We show that dilated convolutions are effective at modeling the locations of regulatory markers in the human genome, such as transcription factor binding sites, histone modifications, and DNAse hypersensitivity sites.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

harvardnlp/regulatory-prediction
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Chromatin Dynamics · RNA and protein synthesis mechanisms · Machine Learning in Bioinformatics