A Structured Hardware Software Architecture for Peptide Based Diagnosis - Sub-string Matching Problem with Limited Tolerance (ICIAfS14)
S.M.Vidanagamachchi, S.D. Dewasurendra, R.G. Ragel, M. Niranjan

TL;DR
This paper introduces a model-based peptide inference method that efficiently identifies proteins with variations, using a new substring matching approach with limited tolerance, validated on proteomic data with significant speed improvements.
Contribution
It proposes the Sub-string Matching Problem with Limited Tolerance (SMPLT) and a workflow for protein inference that handles variations, improving speed and accuracy over existing methods.
Findings
Achieved up to 70 times speedup in protein identification.
Validated approach on UNIPROT data set.
Applicable to inexact multiple pattern matching problems.
Abstract
The problem of inferring proteins from complex peptide samples in shotgun proteomic workflow sets extreme demands on computational resources. This is exacerbated by the fact that, in general, a given protein cannot be defined by a fixed sequence of amino acids due to the existence of splice variants and isoforms of that protein. Therefore, the problem of protein inference could be considered as one of identifying sequences of amino acids with some limited tolerance. Two problems arise from this: a) due to these variations, the applicability of exact string matching methodologies could be questioned and b) the difficulty of defining a reference sequence for a particular set of proteins that are functionally indistinguishable, but with some variation in features. This paper presents a model-based inference approach that is developed and validated to solve the inference problem. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Proteomics Techniques and Applications · Genomics and Phylogenetic Studies · Machine Learning in Bioinformatics
