MicroFuse: Protein-to-Genome Expert Fusion for Microbial Operon Reasoning
Seungik Cho

TL;DR
MicroFuse is a novel fusion framework that integrates protein and genome context representations to improve microbial operon prediction, especially in ambiguous cases.
Contribution
It introduces a protein-to-genome expert fusion method with a Mixture-of-Experts model and a new large benchmark dataset for operon prediction.
Findings
MicroFuse outperforms existing baselines on the OG-Operon100K benchmark.
Cross-modal contrastive alignment is crucial for performance.
Largest gains occur in biologically ambiguous cases with conflicting signals.
Abstract
Predicting microbial operon co-membership requires integrating two complementary biological signals: protein-scale molecular identity and genome-context organization. While recent biological foundation models provide powerful representations of each view independently, naive concatenation of these modalities ignores a key biological property -- protein identity and genomic context may agree when adjacent genes form a coherent functional module, or conflict when sequence similarity is misleading but genomic layout indicates independent regulation. We present MicroFuse, a protein-to-genome expert fusion framework that integrates structure-aware protein representations from ProstT5 with genome-context representations from Bacformer through a four-expert Mixture-of-Experts module (protein, genome-context, agreement, and conflict experts) with a learned soft router. Training combines binary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
