TL;DR
This paper introduces a new non-parametric, allele-specific HMM model for B cell receptor sequence annotation, significantly improving accuracy over previous parametric models, and provides an efficient software implementation called partis.
Contribution
It develops a novel non-parametric HMM approach with per-allele parameters for BCR annotation, enhancing inference accuracy and efficiency.
Findings
Model using allele-specific categorical distributions improves annotation accuracy.
The partis software enables efficient, high-accuracy BCR sequence analysis.
Non-parametric modeling captures complex mutation and recombination patterns.
Abstract
VDJ rearrangement and somatic hypermutation work together to produce antibody-coding B cell receptor (BCR) sequences for a remarkable diversity of antigens. It is now possible to sequence these BCRs in high throughput; analysis of these sequences is bringing new insight into how antibodies develop, in particular for broadly-neutralizing antibodies against HIV and influenza. A fundamental step in such sequence analysis is to annotate each base as coming from a specific one of the V, D, or J genes, or from an N-addition (a.k.a. non-templated insertion). Previous work has used simple parametric distributions to model transitions from state to state in a hidden Markov model (HMM) of VDJ recombination, and assumed that mutations occur via the same process across sites. However, codon frame and other effects have been observed to violate these parametric assumptions for such coding sequences,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
