Linkage-based ortholog refinement in bacterial pangenomes with CLARC
Indra González Ojeda, Samantha G Palace, Pamela P Martinez, Taj Azarian, Lindsay R Grant, Laura L Hammitt, William P Hanage, Marc Lipsitch

TL;DR
CLARC improves bacterial pangenome analysis by refining gene groupings, reducing overestimation of accessory genes and enhancing evolutionary predictions.
Contribution
CLARC introduces a novel method for refining ortholog groups using functional and linkage data, reducing accessory gene overestimation.
Findings
CLARC reduced accessory gene estimates by over 30% in Streptococcus pneumoniae.
The method improves evolutionary predictions based on accessory gene frequencies.
CLARC is broadly applicable across different bacterial species.
Abstract
Bacterial genomes exhibit significant variation in gene content and sequence identity. Pangenome analyses explore this diversity by classifying genes into core and accessory clusters of orthologous groups (COGs). However, strict sequence identity cutoffs can misclassify divergent alleles as different genes, inflating accessory gene counts. CLARC (Connected Linkage and Alignment Redefinition of COGs) (https://github.com/IndraGonz/CLARC) improves pangenome analyses by condensing accessory COGs using functional annotation and linkage information. Through this approach, orthologous groups are consolidated into more practical units of selection. Analyzing 8000+ Streptococcus pneumoniae genomes, CLARC reduced accessory gene estimates by >30% and improved evolutionary predictions based on accessory gene frequencies. CLARC is effective across different bacterial species, making it a broadly…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Bacterial Identification and Susceptibility Testing · Bacterial Genetics and Biotechnology
