Alignment of protein-coding sequences with frameshift extension penalties
Fran\c{c}ois B\'elanger, A\"ida Ouangraoua

TL;DR
This paper presents a novel algorithm for aligning protein-coding sequences that incorporates frameshift extension penalties, allowing for more accurate modeling of frameshift events by considering variable codon substitution scores.
Contribution
The algorithm introduces a frameshift extension penalty and considers the full set of possible alignments without length constraints, improving upon previous methods.
Findings
Handles frameshift extensions with variable penalties
Maintains classical asymptotic complexity
Allows comprehensive alignment search space
Abstract
We introduce an algorithm for the alignment of protein- coding sequences accounting for frameshifts. The main specificity of this algorithm as compared to previously published protein-coding sequence alignment methods is the introduction of a penalty cost for frameshift ex- tensions. Previous algorithms have only used constant frameshift penal- ties. This is similar to the use of scoring schemes with affine gap penalties in classical sequence alignment algorithms. However, the overall penalty of a frameshift portion in an alignment cannot be formulated as an affine function, because it should also incorporate varying codon substitution scores. The second specificity of the algorithm is its search space being the set of all possible alignments between two coding sequences, under the classical definition of an alignment between two DNA sequences. Previous algorithms have introduced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Machine Learning in Bioinformatics
