Capacity of DNA Data Embedding Under Substitution Mutations

F\'elix Balado

arXiv:1101.3457·cs.IT·October 9, 2014

Capacity of DNA Data Embedding Under Substitution Mutations

F\'elix Balado

PDF

TL;DR

This paper analyzes the maximum information capacity of DNA data embedding when DNA sequences undergo substitution mutations, using the Kimura model, and discusses biological implications.

Contribution

It derives the Shannon capacity for DNA data embedding under substitution mutations modeled by the Kimura model, linking information theory with molecular evolution.

Findings

01

Capacity depends on mutation rates and DNA type

02

Biological constraints influence embedding limits

03

Results inform DNA watermarking robustness

Abstract

A number of methods have been proposed over the last decade for encoding information using deoxyribonucleic acid (DNA), giving rise to the emerging area of DNA data embedding. Since a DNA sequence is conceptually equivalent to a sequence of quaternary symbols (bases), DNA data embedding (diversely called DNA watermarking or DNA steganography) can be seen as a digital communications problem where channel errors are tantamount to mutations of DNA bases. Depending on the use of coding or noncoding DNA hosts, which, respectively, denote DNA segments that can or cannot be translated into proteins, DNA data embedding is essentially a problem of communications with or without side information at the encoder. In this paper the Shannon capacity of DNA data embedding is obtained for the case in which DNA sequences are subject to substitution mutations modelled using the Kimura model from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.