HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts
Wenxuan Huang, Mingyu Tsoi, Yanhao Huang, Xinjie Mao, Xue Xia, Hao Wu, Jiaqi Wei, Yuejin Yang, Lang Yu, Cheng Tan, Xiang Zhang, Zhangyang Gao, Siqi Sun

TL;DR
HarmonyCell is an automated framework that addresses semantic and distribution heterogeneity in single-cell perturbation studies, enabling scalable, dataset-agnostic modeling with high validity and competitive performance.
Contribution
It introduces a novel end-to-end agent combining LLM-driven metadata unification and adaptive architecture search to handle heterogeneity in single-cell data analysis.
Findings
Achieves 95% valid execution rate on diverse datasets.
Outperforms general agents in out-of-distribution tests.
Matches or exceeds expert-designed baselines.
Abstract
Single-cell perturbation studies face dual heterogeneity bottlenecks: (i) semantic heterogeneity--identical biological concepts encoded under incompatible metadata schemas across datasets; and (ii) statistical heterogeneity--distribution shifts from biological variation demanding dataset-specific inductive biases. We propose HarmonyCell, an end-to-end agent framework resolving each challenge through a dedicated mechanism: an LLM-driven Semantic Unifier autonomously maps disparate metadata into a canonical interface without manual intervention; and an adaptive Monte Carlo Tree Search engine operates over a hierarchical action space to synthesize architectures with optimal statistical inductive biases for distribution shifts. Evaluated across diverse perturbation tasks under both semantic and distribution shifts, HarmonyCell achieves a 95% valid execution rate on heterogeneous input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Gene Regulatory Network Analysis
