False Discovery Rate Controlling Procedures with BLOSUM62 substitution matrix and their application to HIV Data
Kyurhi Kim, Junyong Park, Dohwan Park, Mileiy Giraldo, Muriel, Aldunate, John L. Spouge, Gilda Tachedjian

TL;DR
This paper introduces new statistical methods that incorporate biological substitution information via BLOSUM62 to improve the detection of significant sites in sparse HIV sequence data, outperforming traditional Fisher's test.
Contribution
It develops two novel models integrating biological substitution matrices into false discovery rate procedures for analyzing sparse biological data.
Findings
Proposed methods identify significant sites in HIV data.
Traditional Fisher's test fails to detect any sites.
Models leverage BLOSUM62 for improved detection.
Abstract
Identifying significant sites in sequence data and analogous data is of fundamental importance in many biological fields. Fisher's exact test is a popular technique, however this approach to sparse count data is not appropriate due to conservative decisions. Since count data in HIV data are typically very sparse, it is crucial to use additional information to statistical models to improve testing power. In order to develop new approaches to incorporate biological information in the false discovery controlling procedure, we propose two models: one based on the empirical Bayes model under independence of amino acids and the other uses pairwise associations of amino acids based on Markov random field with on the BLOSUM62 substitution matrix. We apply the proposed methods to HIV data and identify significant sites incorporating BLOSUM62 matrix while the traditional method based on Fisher's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · HIV Research and Treatment · Machine Learning in Bioinformatics
