How to measure the topological quality of protein grammars?
Witold Dyrka, Fran\c{c}ois Coste, Olgierd Unold, {\L}ukasz Culer,, Agnieszka Kaczmarek

TL;DR
This paper proposes a set of measures to evaluate how well different grammatical models of proteins capture their structural topology, addressing a gap in systematic assessment of grammatical expressiveness.
Contribution
It introduces objective criteria and measures for comparing the topology of parse trees generated by grammars with actual protein structures.
Findings
Proposed measures enable systematic evaluation of grammatical models.
Initial results suggest varying levels of topological accuracy among models.
Framework facilitates future research in protein grammar assessment.
Abstract
Context-free and context-sensitive formal grammars are often regarded as more appropriate to model proteins than regular level models such as finite state automata and Hidden Markov Models. In theory, the claim is well founded in the fact that many biologically relevant interactions between residues of protein sequences have a character of nested or crossed dependencies. In practice, there is hardly any evidence that grammars of higher expressiveness have an edge over old good HMMs in typical applications including recognition and classification of protein sequences. This is in contrast to RNA modeling, where CFG power some of the most successful tools. There have been proposed several explanations of this phenomenon. On the biology side, one difficulty is that interactions in proteins are often less specific and more "collective" in comparison to RNA. On the modeling side, a difficulty…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
